diff --git a/CRM/Increase customer satisfaction/1Notebook.ipynb b/CRM/Increase customer satisfaction/1Notebook.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..f245c97c15bff6b25c83c22cb51ffb203aac7955
--- /dev/null
+++ b/CRM/Increase customer satisfaction/1Notebook.ipynb	
@@ -0,0 +1,4263 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Business",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "# 1. Business Understanding\n",
+    "\n",
+    "Aufgrund der großen Auswahl, die Netflix zu bieten hat, ist es für die Nutzer schwierig, geeignete Filme für sich zu finden. Die Suche in der Bibliothek nimmt viel Zeit in Anspruch und schafft ein schlechtes Nutzererlebnis, was wiederum zu höheren Abbruchquoten führt.\n",
+    "Um die Abbruchquoten zu senken, muss geprüft werden, ob die Kundenzufriedenheit durch die Anwendung von maschinellem Lernen in Bezug auf Filmempfehlungen erhöht werden kann.\n",
+    "\n",
+    "\n",
+    "Der Datensatz enthält Filmdaten aus dem tmdb Dataset.\n",
+    "Finden Sie heraus, welche Faktoren auf der Grundlage der Daten über die Beliebtheit oder Bewertung der Filme ergriffen werden können, um Strategien für das Unternehmen zu entwickeln.\n",
+    "Basierend auf dem obigen Geschäftsproblem definieren wir die abhängige Variable (y)\n",
+    "\n",
+    "Problem 1: y = Popularität / Voting-Durchschnitt (Regressionsproblem)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Daten",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "# 2. Daten und Datenverständnis\n",
+    "\n",
+    "Aus dem Datensatz ist ersichtlich, dass sowohl Zahlen als auch kategoriale Werte enthalten sind. Jede Kategorie bezieht sich auf den entsprechenden Film in der Zeile. So enthält beispielsweise die Spalte \"Crew\" mehrere Mitwirkende wie Autoren, Filmeditor usw., während \"Cast\" die Schauspieler enthält, die in den jeweiligen Filmen mitspielen. Außerdem hat jeder Film eine eindeutige ID, z. B. movie_id/id, die identisch ist und es ermöglicht, beide Datensätze zu kombinieren. Alle Daten sind sehr verständlich und selbsterklärend, und der Inhalt ist auf kaggle.com ausdrücklich beschrieben."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.1 Import von relevanten Modulen"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Daten",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Dieser Code-Block importiert Bibliotheken und Module, die für Datenanalyse, statistische Modellierung,\n",
+    "maschinelles Lernen und Visualisierung in Python verwendet werden.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas  as pd\n",
+    "import statsmodels.api as sm\n",
+    "import matplotlib.pyplot as plt<\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.tree import DecisionTreeClassifier\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn import metrics\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.metrics import confusion_matrix, classification_report\n",
+    "import seaborn as sns\n",
+    "sns.set()\n",
+    "\n",
+    "\n",
+    "# statsmodels benötigt diese Funktion (chisqprob) von skipy für Berichte\n",
+    "from scipy import stats\n",
+    "stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.2 Daten einlesen"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "original_data = pd.read_csv('https://storage.googleapis.com/ml-service-repository-datastorage/Increase_customer_satisfaction_tmdb_5000_movies.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "original_data2 = pd.read_csv('https://storage.googleapis.com/ml-service-repository-datastorage/Increase_customer_satisfaction_tmdb_5000_credits.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "original_data = pd.merge(original_data, original_data2)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Datenverständnis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Daten",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle listet Filme mit verschiedenen Attributen wie \n",
+    "Budget, Genres, Webseite, ID, Schlüsselwörter, Originalsprache, Originaltitel, Zusammenfassung, \n",
+    "Popularität, Produktionsfirmen, Laufzeit, gesprochene Sprachen, Status, Tagline, Titel, \n",
+    "durchschnittliche Bewertung, Anzahl der Stimmen, Film-ID, Besetzung und Crew auf."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>budget</th>\n",
+       "      <th>genres</th>\n",
+       "      <th>homepage</th>\n",
+       "      <th>id</th>\n",
+       "      <th>keywords</th>\n",
+       "      <th>original_language</th>\n",
+       "      <th>original_title</th>\n",
+       "      <th>overview</th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>production_companies</th>\n",
+       "      <th>...</th>\n",
+       "      <th>runtime</th>\n",
+       "      <th>spoken_languages</th>\n",
+       "      <th>status</th>\n",
+       "      <th>tagline</th>\n",
+       "      <th>title</th>\n",
+       "      <th>vote_average</th>\n",
+       "      <th>vote_count</th>\n",
+       "      <th>movie_id</th>\n",
+       "      <th>cast</th>\n",
+       "      <th>crew</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>237000000</td>\n",
+       "      <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
+       "      <td>http://www.avatarmovie.com/</td>\n",
+       "      <td>19995</td>\n",
+       "      <td>[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...</td>\n",
+       "      <td>en</td>\n",
+       "      <td>Avatar</td>\n",
+       "      <td>In the 22nd century, a paraplegic Marine is di...</td>\n",
+       "      <td>150.437577</td>\n",
+       "      <td>[{\"name\": \"Ingenious Film Partners\", \"id\": 289...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>162.0</td>\n",
+       "      <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...</td>\n",
+       "      <td>Released</td>\n",
+       "      <td>Enter the World of Pandora.</td>\n",
+       "      <td>Avatar</td>\n",
+       "      <td>7.2</td>\n",
+       "      <td>11800</td>\n",
+       "      <td>19995</td>\n",
+       "      <td>[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...</td>\n",
+       "      <td>[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>300000000</td>\n",
+       "      <td>[{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...</td>\n",
+       "      <td>http://disney.go.com/disneypictures/pirates/</td>\n",
+       "      <td>285</td>\n",
+       "      <td>[{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...</td>\n",
+       "      <td>en</td>\n",
+       "      <td>Pirates of the Caribbean: At World's End</td>\n",
+       "      <td>Captain Barbossa, long believed to be dead, ha...</td>\n",
+       "      <td>139.082615</td>\n",
+       "      <td>[{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>169.0</td>\n",
+       "      <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}]</td>\n",
+       "      <td>Released</td>\n",
+       "      <td>At the end of the world, the adventure begins.</td>\n",
+       "      <td>Pirates of the Caribbean: At World's End</td>\n",
+       "      <td>6.9</td>\n",
+       "      <td>4500</td>\n",
+       "      <td>285</td>\n",
+       "      <td>[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...</td>\n",
+       "      <td>[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>245000000</td>\n",
+       "      <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
+       "      <td>http://www.sonypictures.com/movies/spectre/</td>\n",
+       "      <td>206647</td>\n",
+       "      <td>[{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...</td>\n",
+       "      <td>en</td>\n",
+       "      <td>Spectre</td>\n",
+       "      <td>A cryptic message from Bond’s past sends him o...</td>\n",
+       "      <td>107.376788</td>\n",
+       "      <td>[{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>148.0</td>\n",
+       "      <td>[{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},...</td>\n",
+       "      <td>Released</td>\n",
+       "      <td>A Plan No One Escapes</td>\n",
+       "      <td>Spectre</td>\n",
+       "      <td>6.3</td>\n",
+       "      <td>4466</td>\n",
+       "      <td>206647</td>\n",
+       "      <td>[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...</td>\n",
+       "      <td>[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>250000000</td>\n",
+       "      <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...</td>\n",
+       "      <td>http://www.thedarkknightrises.com/</td>\n",
+       "      <td>49026</td>\n",
+       "      <td>[{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...</td>\n",
+       "      <td>en</td>\n",
+       "      <td>The Dark Knight Rises</td>\n",
+       "      <td>Following the death of District Attorney Harve...</td>\n",
+       "      <td>112.312950</td>\n",
+       "      <td>[{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>165.0</td>\n",
+       "      <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}]</td>\n",
+       "      <td>Released</td>\n",
+       "      <td>The Legend Ends</td>\n",
+       "      <td>The Dark Knight Rises</td>\n",
+       "      <td>7.6</td>\n",
+       "      <td>9106</td>\n",
+       "      <td>49026</td>\n",
+       "      <td>[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...</td>\n",
+       "      <td>[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>260000000</td>\n",
+       "      <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
+       "      <td>http://movies.disney.com/john-carter</td>\n",
+       "      <td>49529</td>\n",
+       "      <td>[{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...</td>\n",
+       "      <td>en</td>\n",
+       "      <td>John Carter</td>\n",
+       "      <td>John Carter is a war-weary, former military ca...</td>\n",
+       "      <td>43.926995</td>\n",
+       "      <td>[{\"name\": \"Walt Disney Pictures\", \"id\": 2}]</td>\n",
+       "      <td>...</td>\n",
+       "      <td>132.0</td>\n",
+       "      <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}]</td>\n",
+       "      <td>Released</td>\n",
+       "      <td>Lost in our world, found in another.</td>\n",
+       "      <td>John Carter</td>\n",
+       "      <td>6.1</td>\n",
+       "      <td>2124</td>\n",
+       "      <td>49529</td>\n",
+       "      <td>[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...</td>\n",
+       "      <td>[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>5 rows × 23 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "      budget                                             genres  \\\n",
+       "0  237000000  [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...   \n",
+       "1  300000000  [{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...   \n",
+       "2  245000000  [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...   \n",
+       "3  250000000  [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...   \n",
+       "4  260000000  [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...   \n",
+       "\n",
+       "                                       homepage      id  \\\n",
+       "0                   http://www.avatarmovie.com/   19995   \n",
+       "1  http://disney.go.com/disneypictures/pirates/     285   \n",
+       "2   http://www.sonypictures.com/movies/spectre/  206647   \n",
+       "3            http://www.thedarkknightrises.com/   49026   \n",
+       "4          http://movies.disney.com/john-carter   49529   \n",
+       "\n",
+       "                                            keywords original_language  \\\n",
+       "0  [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...                en   \n",
+       "1  [{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...                en   \n",
+       "2  [{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...                en   \n",
+       "3  [{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...                en   \n",
+       "4  [{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...                en   \n",
+       "\n",
+       "                             original_title  \\\n",
+       "0                                    Avatar   \n",
+       "1  Pirates of the Caribbean: At World's End   \n",
+       "2                                   Spectre   \n",
+       "3                     The Dark Knight Rises   \n",
+       "4                               John Carter   \n",
+       "\n",
+       "                                            overview  popularity  \\\n",
+       "0  In the 22nd century, a paraplegic Marine is di...  150.437577   \n",
+       "1  Captain Barbossa, long believed to be dead, ha...  139.082615   \n",
+       "2  A cryptic message from Bond’s past sends him o...  107.376788   \n",
+       "3  Following the death of District Attorney Harve...  112.312950   \n",
+       "4  John Carter is a war-weary, former military ca...   43.926995   \n",
+       "\n",
+       "                                production_companies  ... runtime  \\\n",
+       "0  [{\"name\": \"Ingenious Film Partners\", \"id\": 289...  ...   162.0   \n",
+       "1  [{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"...  ...   169.0   \n",
+       "2  [{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam...  ...   148.0   \n",
+       "3  [{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"...  ...   165.0   \n",
+       "4        [{\"name\": \"Walt Disney Pictures\", \"id\": 2}]  ...   132.0   \n",
+       "\n",
+       "                                    spoken_languages    status  \\\n",
+       "0  [{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...  Released   \n",
+       "1           [{\"iso_639_1\": \"en\", \"name\": \"English\"}]  Released   \n",
+       "2  [{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},...  Released   \n",
+       "3           [{\"iso_639_1\": \"en\", \"name\": \"English\"}]  Released   \n",
+       "4           [{\"iso_639_1\": \"en\", \"name\": \"English\"}]  Released   \n",
+       "\n",
+       "                                          tagline  \\\n",
+       "0                     Enter the World of Pandora.   \n",
+       "1  At the end of the world, the adventure begins.   \n",
+       "2                           A Plan No One Escapes   \n",
+       "3                                 The Legend Ends   \n",
+       "4            Lost in our world, found in another.   \n",
+       "\n",
+       "                                      title vote_average vote_count movie_id  \\\n",
+       "0                                    Avatar          7.2      11800    19995   \n",
+       "1  Pirates of the Caribbean: At World's End          6.9       4500      285   \n",
+       "2                                   Spectre          6.3       4466   206647   \n",
+       "3                     The Dark Knight Rises          7.6       9106    49026   \n",
+       "4                               John Carter          6.1       2124    49529   \n",
+       "\n",
+       "                                                cast  \\\n",
+       "0  [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...   \n",
+       "1  [{\"cast_id\": 4, \"character\": \"Captain Jack Spa...   \n",
+       "2  [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...   \n",
+       "3  [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...   \n",
+       "4  [{\"cast_id\": 5, \"character\": \"John Carter\", \"c...   \n",
+       "\n",
+       "                                                crew  \n",
+       "0  [{\"credit_id\": \"52fe48009251416c750aca23\", \"de...  \n",
+       "1  [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...  \n",
+       "2  [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...  \n",
+       "3  [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...  \n",
+       "4  [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...  \n",
+       "\n",
+       "[5 rows x 23 columns]"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "original_data.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Daten",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle fasst verschiedene Eigenschaften von Filmen zusammen, \n",
+    "wie Budget, Genres, Homepage, ID, Schlüsselwörter, Originalsprache, Originaltitel, Zusammenfassung, \n",
+    "Popularität, Produktionsfirmen, Laufzeit, gesprochene Sprachen, Status, Tagline, Titel, \n",
+    "durchschnittliche Bewertung, Anzahl der Stimmen, Film-ID, Besetzung und Crew."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>budget</th>\n",
+       "      <th>genres</th>\n",
+       "      <th>homepage</th>\n",
+       "      <th>id</th>\n",
+       "      <th>keywords</th>\n",
+       "      <th>original_language</th>\n",
+       "      <th>original_title</th>\n",
+       "      <th>overview</th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>production_companies</th>\n",
+       "      <th>...</th>\n",
+       "      <th>runtime</th>\n",
+       "      <th>spoken_languages</th>\n",
+       "      <th>status</th>\n",
+       "      <th>tagline</th>\n",
+       "      <th>title</th>\n",
+       "      <th>vote_average</th>\n",
+       "      <th>vote_count</th>\n",
+       "      <th>movie_id</th>\n",
+       "      <th>cast</th>\n",
+       "      <th>crew</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>count</th>\n",
+       "      <td>4.809000e+03</td>\n",
+       "      <td>4809</td>\n",
+       "      <td>1713</td>\n",
+       "      <td>4809.000000</td>\n",
+       "      <td>4809</td>\n",
+       "      <td>4809</td>\n",
+       "      <td>4809</td>\n",
+       "      <td>4806</td>\n",
+       "      <td>4809.000000</td>\n",
+       "      <td>4809</td>\n",
+       "      <td>...</td>\n",
+       "      <td>4807.000000</td>\n",
+       "      <td>4809</td>\n",
+       "      <td>4809</td>\n",
+       "      <td>3965</td>\n",
+       "      <td>4809</td>\n",
+       "      <td>4809.000000</td>\n",
+       "      <td>4809.000000</td>\n",
+       "      <td>4809.000000</td>\n",
+       "      <td>4809</td>\n",
+       "      <td>4809</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>unique</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1175</td>\n",
+       "      <td>1691</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>4222</td>\n",
+       "      <td>37</td>\n",
+       "      <td>4801</td>\n",
+       "      <td>4800</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>3697</td>\n",
+       "      <td>...</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>544</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3944</td>\n",
+       "      <td>4800</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>4761</td>\n",
+       "      <td>4776</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>top</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>[{\"id\": 18, \"name\": \"Drama\"}]</td>\n",
+       "      <td>http://www.missionimpossible.com/</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>en</td>\n",
+       "      <td>Out of the Blue</td>\n",
+       "      <td>Dennis Hopper is a hard-drinking truck driver ...</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>...</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}]</td>\n",
+       "      <td>Released</td>\n",
+       "      <td>Based on a true story.</td>\n",
+       "      <td>Out of the Blue</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>[]</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>freq</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>372</td>\n",
+       "      <td>4</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>412</td>\n",
+       "      <td>4510</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>352</td>\n",
+       "      <td>...</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>3175</td>\n",
+       "      <td>4801</td>\n",
+       "      <td>3</td>\n",
+       "      <td>4</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>43</td>\n",
+       "      <td>28</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>mean</th>\n",
+       "      <td>2.902780e+07</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>57120.571429</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>21.491664</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>...</td>\n",
+       "      <td>106.882255</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>6.092514</td>\n",
+       "      <td>690.331670</td>\n",
+       "      <td>57120.571429</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>std</th>\n",
+       "      <td>4.070473e+07</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>88653.369849</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>31.803366</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>...</td>\n",
+       "      <td>22.602535</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.193989</td>\n",
+       "      <td>1234.187111</td>\n",
+       "      <td>88653.369849</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>min</th>\n",
+       "      <td>0.000000e+00</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>5.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>5.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>25%</th>\n",
+       "      <td>7.800000e+05</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>9012.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>4.667230</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>...</td>\n",
+       "      <td>94.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>5.600000</td>\n",
+       "      <td>54.000000</td>\n",
+       "      <td>9012.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>50%</th>\n",
+       "      <td>1.500000e+07</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>14624.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>12.921594</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>...</td>\n",
+       "      <td>103.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>6.200000</td>\n",
+       "      <td>235.000000</td>\n",
+       "      <td>14624.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>75%</th>\n",
+       "      <td>4.000000e+07</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>58595.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>28.350529</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>...</td>\n",
+       "      <td>118.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>6.800000</td>\n",
+       "      <td>737.000000</td>\n",
+       "      <td>58595.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>max</th>\n",
+       "      <td>3.800000e+08</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>459488.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>875.581305</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>...</td>\n",
+       "      <td>338.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>10.000000</td>\n",
+       "      <td>13752.000000</td>\n",
+       "      <td>459488.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>11 rows × 23 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "              budget                         genres  \\\n",
+       "count   4.809000e+03                           4809   \n",
+       "unique           NaN                           1175   \n",
+       "top              NaN  [{\"id\": 18, \"name\": \"Drama\"}]   \n",
+       "freq             NaN                            372   \n",
+       "mean    2.902780e+07                            NaN   \n",
+       "std     4.070473e+07                            NaN   \n",
+       "min     0.000000e+00                            NaN   \n",
+       "25%     7.800000e+05                            NaN   \n",
+       "50%     1.500000e+07                            NaN   \n",
+       "75%     4.000000e+07                            NaN   \n",
+       "max     3.800000e+08                            NaN   \n",
+       "\n",
+       "                                 homepage             id keywords  \\\n",
+       "count                                1713    4809.000000     4809   \n",
+       "unique                               1691            NaN     4222   \n",
+       "top     http://www.missionimpossible.com/            NaN       []   \n",
+       "freq                                    4            NaN      412   \n",
+       "mean                                  NaN   57120.571429      NaN   \n",
+       "std                                   NaN   88653.369849      NaN   \n",
+       "min                                   NaN       5.000000      NaN   \n",
+       "25%                                   NaN    9012.000000      NaN   \n",
+       "50%                                   NaN   14624.000000      NaN   \n",
+       "75%                                   NaN   58595.000000      NaN   \n",
+       "max                                   NaN  459488.000000      NaN   \n",
+       "\n",
+       "       original_language   original_title  \\\n",
+       "count               4809             4809   \n",
+       "unique                37             4801   \n",
+       "top                   en  Out of the Blue   \n",
+       "freq                4510                4   \n",
+       "mean                 NaN              NaN   \n",
+       "std                  NaN              NaN   \n",
+       "min                  NaN              NaN   \n",
+       "25%                  NaN              NaN   \n",
+       "50%                  NaN              NaN   \n",
+       "75%                  NaN              NaN   \n",
+       "max                  NaN              NaN   \n",
+       "\n",
+       "                                                 overview   popularity  \\\n",
+       "count                                                4806  4809.000000   \n",
+       "unique                                               4800          NaN   \n",
+       "top     Dennis Hopper is a hard-drinking truck driver ...          NaN   \n",
+       "freq                                                    2          NaN   \n",
+       "mean                                                  NaN    21.491664   \n",
+       "std                                                   NaN    31.803366   \n",
+       "min                                                   NaN     0.000000   \n",
+       "25%                                                   NaN     4.667230   \n",
+       "50%                                                   NaN    12.921594   \n",
+       "75%                                                   NaN    28.350529   \n",
+       "max                                                   NaN   875.581305   \n",
+       "\n",
+       "       production_companies  ...      runtime  \\\n",
+       "count                  4809  ...  4807.000000   \n",
+       "unique                 3697  ...          NaN   \n",
+       "top                      []  ...          NaN   \n",
+       "freq                    352  ...          NaN   \n",
+       "mean                    NaN  ...   106.882255   \n",
+       "std                     NaN  ...    22.602535   \n",
+       "min                     NaN  ...     0.000000   \n",
+       "25%                     NaN  ...    94.000000   \n",
+       "50%                     NaN  ...   103.000000   \n",
+       "75%                     NaN  ...   118.000000   \n",
+       "max                     NaN  ...   338.000000   \n",
+       "\n",
+       "                                spoken_languages    status  \\\n",
+       "count                                       4809      4809   \n",
+       "unique                                       544         3   \n",
+       "top     [{\"iso_639_1\": \"en\", \"name\": \"English\"}]  Released   \n",
+       "freq                                        3175      4801   \n",
+       "mean                                         NaN       NaN   \n",
+       "std                                          NaN       NaN   \n",
+       "min                                          NaN       NaN   \n",
+       "25%                                          NaN       NaN   \n",
+       "50%                                          NaN       NaN   \n",
+       "75%                                          NaN       NaN   \n",
+       "max                                          NaN       NaN   \n",
+       "\n",
+       "                       tagline            title vote_average    vote_count  \\\n",
+       "count                     3965             4809  4809.000000   4809.000000   \n",
+       "unique                    3944             4800          NaN           NaN   \n",
+       "top     Based on a true story.  Out of the Blue          NaN           NaN   \n",
+       "freq                         3                4          NaN           NaN   \n",
+       "mean                       NaN              NaN     6.092514    690.331670   \n",
+       "std                        NaN              NaN     1.193989   1234.187111   \n",
+       "min                        NaN              NaN     0.000000      0.000000   \n",
+       "25%                        NaN              NaN     5.600000     54.000000   \n",
+       "50%                        NaN              NaN     6.200000    235.000000   \n",
+       "75%                        NaN              NaN     6.800000    737.000000   \n",
+       "max                        NaN              NaN    10.000000  13752.000000   \n",
+       "\n",
+       "             movie_id  cast  crew  \n",
+       "count     4809.000000  4809  4809  \n",
+       "unique            NaN  4761  4776  \n",
+       "top               NaN    []    []  \n",
+       "freq              NaN    43    28  \n",
+       "mean     57120.571429   NaN   NaN  \n",
+       "std      88653.369849   NaN   NaN  \n",
+       "min          5.000000   NaN   NaN  \n",
+       "25%       9012.000000   NaN   NaN  \n",
+       "50%      14624.000000   NaN   NaN  \n",
+       "75%      58595.000000   NaN   NaN  \n",
+       "max     459488.000000   NaN   NaN  \n",
+       "\n",
+       "[11 rows x 23 columns]"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "original_data.describe(include=\"all\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Daten",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Das ist eine Zusammenfassung eines Pandas DataFrame mit 4809 Einträgen und 23 Spalten, die verschiedene Informationen über Filme enthält, wie Budget, Genres, Homepage, ID, Schlüsselwörter, Sprache, Titel, Popularität, Produktionsfirmen, Einnahmen, Laufzeit, Bewertung, Stimmenzahl,\n",
+    "Besetzung und Crew."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "Int64Index: 4809 entries, 0 to 4808\n",
+      "Data columns (total 23 columns):\n",
+      " #   Column                Non-Null Count  Dtype  \n",
+      "---  ------                --------------  -----  \n",
+      " 0   budget                4809 non-null   int64  \n",
+      " 1   genres                4809 non-null   object \n",
+      " 2   homepage              1713 non-null   object \n",
+      " 3   id                    4809 non-null   int64  \n",
+      " 4   keywords              4809 non-null   object \n",
+      " 5   original_language     4809 non-null   object \n",
+      " 6   original_title        4809 non-null   object \n",
+      " 7   overview              4806 non-null   object \n",
+      " 8   popularity            4809 non-null   float64\n",
+      " 9   production_companies  4809 non-null   object \n",
+      " 10  production_countries  4809 non-null   object \n",
+      " 11  release_date          4808 non-null   object \n",
+      " 12  revenue               4809 non-null   int64  \n",
+      " 13  runtime               4807 non-null   float64\n",
+      " 14  spoken_languages      4809 non-null   object \n",
+      " 15  status                4809 non-null   object \n",
+      " 16  tagline               3965 non-null   object \n",
+      " 17  title                 4809 non-null   object \n",
+      " 18  vote_average          4809 non-null   float64\n",
+      " 19  vote_count            4809 non-null   int64  \n",
+      " 20  movie_id              4809 non-null   int64  \n",
+      " 21  cast                  4809 non-null   object \n",
+      " 22  crew                  4809 non-null   object \n",
+      "dtypes: float64(3), int64(5), object(15)\n",
+      "memory usage: 901.7+ KB\n"
+     ]
+    }
+   ],
+   "source": [
+    "original_data.info()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.3 Datenbereinigung"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "###  Auf Nullwerte prüfen"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Daten",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle zeigt die Anzahl der fehlenden Werte in jeder Spalte eines DataFrame an, wobei z.B. die Spalte 'homepage' 3096 fehlende Werte hat, \n",
+    "während viele andere Spalten keine fehlenden Werte aufweisen."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "budget                     0\n",
+       "genres                     0\n",
+       "homepage                3096\n",
+       "id                         0\n",
+       "keywords                   0\n",
+       "original_language          0\n",
+       "original_title             0\n",
+       "overview                   3\n",
+       "popularity                 0\n",
+       "production_companies       0\n",
+       "production_countries       0\n",
+       "release_date               1\n",
+       "revenue                    0\n",
+       "runtime                    2\n",
+       "spoken_languages           0\n",
+       "status                     0\n",
+       "tagline                  844\n",
+       "title                      0\n",
+       "vote_average               0\n",
+       "vote_count                 0\n",
+       "movie_id                   0\n",
+       "cast                       0\n",
+       "crew                       0\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "original_data.isnull().sum()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_wo_null = original_data.dropna(axis=0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "budget                  0\n",
+       "genres                  0\n",
+       "homepage                0\n",
+       "id                      0\n",
+       "keywords                0\n",
+       "original_language       0\n",
+       "original_title          0\n",
+       "overview                0\n",
+       "popularity              0\n",
+       "production_companies    0\n",
+       "production_countries    0\n",
+       "release_date            0\n",
+       "revenue                 0\n",
+       "runtime                 0\n",
+       "spoken_languages        0\n",
+       "status                  0\n",
+       "tagline                 0\n",
+       "title                   0\n",
+       "vote_average            0\n",
+       "vote_count              0\n",
+       "movie_id                0\n",
+       "cast                    0\n",
+       "crew                    0\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_wo_null.isnull().sum()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Auf Duplikate prüfen"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>budget</th>\n",
+       "      <th>genres</th>\n",
+       "      <th>homepage</th>\n",
+       "      <th>id</th>\n",
+       "      <th>keywords</th>\n",
+       "      <th>original_language</th>\n",
+       "      <th>original_title</th>\n",
+       "      <th>overview</th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>production_companies</th>\n",
+       "      <th>...</th>\n",
+       "      <th>runtime</th>\n",
+       "      <th>spoken_languages</th>\n",
+       "      <th>status</th>\n",
+       "      <th>tagline</th>\n",
+       "      <th>title</th>\n",
+       "      <th>vote_average</th>\n",
+       "      <th>vote_count</th>\n",
+       "      <th>movie_id</th>\n",
+       "      <th>cast</th>\n",
+       "      <th>crew</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>0 rows × 23 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "Empty DataFrame\n",
+       "Columns: [budget, genres, homepage, id, keywords, original_language, original_title, overview, popularity, production_companies, production_countries, release_date, revenue, runtime, spoken_languages, status, tagline, title, vote_average, vote_count, movie_id, cast, crew]\n",
+       "Index: []\n",
+       "\n",
+       "[0 rows x 23 columns]"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_wo_null[df_wo_null.duplicated(keep=False)]"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.4 Test auf Multikollinearität"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Daten",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle zeigt, dass es keine fehlenden Werte in den Spalten für verschiedene Merkmale wie Budget, Genres, Homepage, usw. gibt."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Keine nicht-signifikanten Variablen mehr. Das endgültige Modell wird erstellt."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Daten",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Der Code erstellt eine Wärmebildkarte, die die Korrelationen zwischen den Spalten des DataFrames df_wo_null zeigt."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": "",
+      "text/plain": [
+       "<Figure size 1296x1296 with 2 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "f,ax=plt.subplots(figsize = (18,18))\n",
+    "sns.heatmap(df_wo_null.corr(),annot= True,linewidths=0.5,fmt = \".1f\",ax=ax)\n",
+    "plt.xticks(rotation=90)\n",
+    "plt.yticks(rotation=0)\n",
+    "plt.title('Correlation Map')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Der Code entfernt die Spalten 'tagline', 'homepage', 'id' und 'movie_id' aus dem DataFrame df_wo_null.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_wo_null = df_wo_null.drop(['tagline', 'homepage', 'id', 'movie_id'], axis = 1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Daten",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Der Code erstellt und zeigt eine große Wärmebildkarte der Korrelationen \n",
+    "zwischen den Spalten von df_wo_null mit Beschriftungen, Linien und einem Titel an.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": "",
+      "text/plain": [
+       "<Figure size 1296x1296 with 2 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "f,ax=plt.subplots(figsize = (18,18))\n",
+    "sns.heatmap(df_wo_null.corr(),annot= True,linewidths=0.5,fmt = \".1f\",ax=ax)\n",
+    "plt.xticks(rotation=90)\n",
+    "plt.yticks(rotation=0)\n",
+    "plt.title('Correlation Map')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.5 Deskriptive Analysise "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_wo_null = df_wo_null.drop(['status', 'original_title', 'overview'], axis = 1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_wo_null = df_wo_null.drop(['production_countries', 'original_language', 'crew', 'spoken_languages'], axis = 1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_wo_null = df_wo_null.drop(['runtime', 'keywords', 'vote_average', 'budget'], axis = 1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Daten",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle listet Filme mit ihren Genres, Popularität, Produktionsfirmen, Veröffentlichungsdatum, Einnahmen, Titel, \n",
+    "Anzahl der Stimmen und Besetzung auf."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>genres</th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>production_companies</th>\n",
+       "      <th>release_date</th>\n",
+       "      <th>revenue</th>\n",
+       "      <th>title</th>\n",
+       "      <th>vote_count</th>\n",
+       "      <th>cast</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
+       "      <td>150.437577</td>\n",
+       "      <td>[{\"name\": \"Ingenious Film Partners\", \"id\": 289...</td>\n",
+       "      <td>2009-12-10</td>\n",
+       "      <td>2787965087</td>\n",
+       "      <td>Avatar</td>\n",
+       "      <td>11800</td>\n",
+       "      <td>[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>[{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...</td>\n",
+       "      <td>139.082615</td>\n",
+       "      <td>[{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"...</td>\n",
+       "      <td>2007-05-19</td>\n",
+       "      <td>961000000</td>\n",
+       "      <td>Pirates of the Caribbean: At World's End</td>\n",
+       "      <td>4500</td>\n",
+       "      <td>[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
+       "      <td>107.376788</td>\n",
+       "      <td>[{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam...</td>\n",
+       "      <td>2015-10-26</td>\n",
+       "      <td>880674609</td>\n",
+       "      <td>Spectre</td>\n",
+       "      <td>4466</td>\n",
+       "      <td>[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...</td>\n",
+       "      <td>112.312950</td>\n",
+       "      <td>[{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"...</td>\n",
+       "      <td>2012-07-16</td>\n",
+       "      <td>1084939099</td>\n",
+       "      <td>The Dark Knight Rises</td>\n",
+       "      <td>9106</td>\n",
+       "      <td>[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
+       "      <td>43.926995</td>\n",
+       "      <td>[{\"name\": \"Walt Disney Pictures\", \"id\": 2}]</td>\n",
+       "      <td>2012-03-07</td>\n",
+       "      <td>284139100</td>\n",
+       "      <td>John Carter</td>\n",
+       "      <td>2124</td>\n",
+       "      <td>[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                              genres  popularity  \\\n",
+       "0  [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...  150.437577   \n",
+       "1  [{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...  139.082615   \n",
+       "2  [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...  107.376788   \n",
+       "3  [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...  112.312950   \n",
+       "4  [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...   43.926995   \n",
+       "\n",
+       "                                production_companies release_date     revenue  \\\n",
+       "0  [{\"name\": \"Ingenious Film Partners\", \"id\": 289...   2009-12-10  2787965087   \n",
+       "1  [{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"...   2007-05-19   961000000   \n",
+       "2  [{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam...   2015-10-26   880674609   \n",
+       "3  [{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"...   2012-07-16  1084939099   \n",
+       "4        [{\"name\": \"Walt Disney Pictures\", \"id\": 2}]   2012-03-07   284139100   \n",
+       "\n",
+       "                                      title  vote_count  \\\n",
+       "0                                    Avatar       11800   \n",
+       "1  Pirates of the Caribbean: At World's End        4500   \n",
+       "2                                   Spectre        4466   \n",
+       "3                     The Dark Knight Rises        9106   \n",
+       "4                               John Carter        2124   \n",
+       "\n",
+       "                                                cast  \n",
+       "0  [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...  \n",
+       "1  [{\"cast_id\": 4, \"character\": \"Captain Jack Spa...  \n",
+       "2  [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...  \n",
+       "3  [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...  \n",
+       "4  [{\"cast_id\": 5, \"character\": \"John Carter\", \"c...  "
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_wo_null.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenvorbereitung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "# 3. Datenaufbereitung"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.1 Erfassung kategorialer Variablen"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_wo_null['genre1'] = df_wo_null['genres'].str.split(',').str[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_wo_null['genre1'] = df_wo_null['genre1'].str.split(':').str[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_wo_null['genre1'] = df_wo_null['genre1'].str.split('\"').str[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_wo_null['genre2'] = df_wo_null['genres'].str.split(',').str[3]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_wo_null['genre2'] = df_wo_null['genre2'].str.split(':').str[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_wo_null['genre2'] = df_wo_null['genre2'].str.split('\"').str[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_1 = df_wo_null.drop(['genres'], axis = 1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenvorbereitung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle zeigt die Anzahl der fehlenden Werte in den angegebenen Spalten eines DataFrames,\n",
+    "wobei z.B. 'genre2' 232 fehlende Werte und 'popularity' keine fehlenden Werte aufweist.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "popularity                0\n",
+       "production_companies      0\n",
+       "release_date              0\n",
+       "revenue                   0\n",
+       "title                     0\n",
+       "vote_count                0\n",
+       "cast                      0\n",
+       "genre1                    2\n",
+       "genre2                  232\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 26,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_1.isnull().sum()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_1[\"new_genres\"] = df_1[\"genre1\"] +\",\"+ df_1[\"genre2\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_2 = df_1.drop(['genre1', 'genre2'], axis = 1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenvorbereitung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle zeigt die Anzahl der fehlenden Werte in bestimmten Spalten eines DataFrames, \n",
+    "wobei die Spalte 'new_genres' 232 fehlende Werte und die anderen aufgeführten Spalten keine fehlenden Werte aufweisen.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "popularity                0\n",
+       "production_companies      0\n",
+       "release_date              0\n",
+       "revenue                   0\n",
+       "title                     0\n",
+       "vote_count                0\n",
+       "cast                      0\n",
+       "new_genres              232\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 29,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_2.isnull().sum()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_2 = df_2.dropna(axis=0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_2['year'] = df_2['release_date'].str.split('-').str[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_2['year'] = df_2['year'].astype(int)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenvorbereitung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle beschreibt einen Pandas DataFrame mit 1262 Einträgen und 9 Spalten, \n",
+    "die verschiedene Filmattribute wie Popularität, Produktionsfirmen, Veröffentlichungsdatum, \n",
+    "Einnahmen, Titel, Stimmenanzahl, Besetzung, #neue Genres und Jahr enthalten, wobei keine Spalte fehlende Werte aufweist."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "Int64Index: 1262 entries, 0 to 4802\n",
+      "Data columns (total 9 columns):\n",
+      " #   Column                Non-Null Count  Dtype  \n",
+      "---  ------                --------------  -----  \n",
+      " 0   popularity            1262 non-null   float64\n",
+      " 1   production_companies  1262 non-null   object \n",
+      " 2   release_date          1262 non-null   object \n",
+      " 3   revenue               1262 non-null   int64  \n",
+      " 4   title                 1262 non-null   object \n",
+      " 5   vote_count            1262 non-null   int64  \n",
+      " 6   cast                  1262 non-null   object \n",
+      " 7   new_genres            1262 non-null   object \n",
+      " 8   year                  1262 non-null   int64  \n",
+      "dtypes: float64(1), int64(3), object(5)\n",
+      "memory usage: 98.6+ KB\n"
+     ]
+    }
+   ],
+   "source": [
+    "df_2.info()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_3 = df_2.drop(['release_date'], axis = 1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenvorbereitung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle enthält Informationen über Filme mit Attributen wie Popularität, Produktionsfirmen, \n",
+    "Einnahmen, Titel, Stimmenanzahl, Besetzung, Genres und Jahr für insgesamt 1262 Filme.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>production_companies</th>\n",
+       "      <th>revenue</th>\n",
+       "      <th>title</th>\n",
+       "      <th>vote_count</th>\n",
+       "      <th>cast</th>\n",
+       "      <th>new_genres</th>\n",
+       "      <th>year</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>150.437577</td>\n",
+       "      <td>[{\"name\": \"Ingenious Film Partners\", \"id\": 289...</td>\n",
+       "      <td>2787965087</td>\n",
+       "      <td>Avatar</td>\n",
+       "      <td>11800</td>\n",
+       "      <td>[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...</td>\n",
+       "      <td>Action,Adventure</td>\n",
+       "      <td>2009</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>139.082615</td>\n",
+       "      <td>[{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"...</td>\n",
+       "      <td>961000000</td>\n",
+       "      <td>Pirates of the Caribbean: At World's End</td>\n",
+       "      <td>4500</td>\n",
+       "      <td>[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...</td>\n",
+       "      <td>Adventure,Fantasy</td>\n",
+       "      <td>2007</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>107.376788</td>\n",
+       "      <td>[{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam...</td>\n",
+       "      <td>880674609</td>\n",
+       "      <td>Spectre</td>\n",
+       "      <td>4466</td>\n",
+       "      <td>[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...</td>\n",
+       "      <td>Action,Adventure</td>\n",
+       "      <td>2015</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>112.312950</td>\n",
+       "      <td>[{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"...</td>\n",
+       "      <td>1084939099</td>\n",
+       "      <td>The Dark Knight Rises</td>\n",
+       "      <td>9106</td>\n",
+       "      <td>[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...</td>\n",
+       "      <td>Action,Crime</td>\n",
+       "      <td>2012</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>43.926995</td>\n",
+       "      <td>[{\"name\": \"Walt Disney Pictures\", \"id\": 2}]</td>\n",
+       "      <td>284139100</td>\n",
+       "      <td>John Carter</td>\n",
+       "      <td>2124</td>\n",
+       "      <td>[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...</td>\n",
+       "      <td>Action,Adventure</td>\n",
+       "      <td>2012</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4764</th>\n",
+       "      <td>27.662696</td>\n",
+       "      <td>[{\"name\": \"Automatik Entertainment\", \"id\": 281...</td>\n",
+       "      <td>600896</td>\n",
+       "      <td>The Signal</td>\n",
+       "      <td>631</td>\n",
+       "      <td>[{\"cast_id\": 1, \"character\": \"Nic Eastman\", \"c...</td>\n",
+       "      <td>Thriller,Science Fiction</td>\n",
+       "      <td>2014</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4772</th>\n",
+       "      <td>3.277287</td>\n",
+       "      <td>[{\"name\": \"FM Productions\", \"id\": 12601}, {\"na...</td>\n",
+       "      <td>321952</td>\n",
+       "      <td>The Last Waltz</td>\n",
+       "      <td>64</td>\n",
+       "      <td>[{\"cast_id\": 1, \"character\": \"Himself\", \"credi...</td>\n",
+       "      <td>Documentary,Music</td>\n",
+       "      <td>1978</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4778</th>\n",
+       "      <td>1.330379</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>10000</td>\n",
+       "      <td>Down Terrace</td>\n",
+       "      <td>26</td>\n",
+       "      <td>[{\"cast_id\": 4, \"character\": \"Bill\", \"credit_i...</td>\n",
+       "      <td>Drama,Action</td>\n",
+       "      <td>2009</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4787</th>\n",
+       "      <td>0.048948</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Dry Spell</td>\n",
+       "      <td>1</td>\n",
+       "      <td>[{\"cast_id\": 4, \"character\": \"Sasha\", \"credit_...</td>\n",
+       "      <td>Comedy,Romance</td>\n",
+       "      <td>2013</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4802</th>\n",
+       "      <td>23.307949</td>\n",
+       "      <td>[{\"name\": \"Thinkfilm\", \"id\": 446}]</td>\n",
+       "      <td>424760</td>\n",
+       "      <td>Primer</td>\n",
+       "      <td>658</td>\n",
+       "      <td>[{\"cast_id\": 1, \"character\": \"Aaron\", \"credit_...</td>\n",
+       "      <td>Science Fiction,Drama</td>\n",
+       "      <td>2004</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>1262 rows × 8 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "      popularity                               production_companies  \\\n",
+       "0     150.437577  [{\"name\": \"Ingenious Film Partners\", \"id\": 289...   \n",
+       "1     139.082615  [{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"...   \n",
+       "2     107.376788  [{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam...   \n",
+       "3     112.312950  [{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"...   \n",
+       "4      43.926995        [{\"name\": \"Walt Disney Pictures\", \"id\": 2}]   \n",
+       "...          ...                                                ...   \n",
+       "4764   27.662696  [{\"name\": \"Automatik Entertainment\", \"id\": 281...   \n",
+       "4772    3.277287  [{\"name\": \"FM Productions\", \"id\": 12601}, {\"na...   \n",
+       "4778    1.330379                                                 []   \n",
+       "4787    0.048948                                                 []   \n",
+       "4802   23.307949                 [{\"name\": \"Thinkfilm\", \"id\": 446}]   \n",
+       "\n",
+       "         revenue                                     title  vote_count  \\\n",
+       "0     2787965087                                    Avatar       11800   \n",
+       "1      961000000  Pirates of the Caribbean: At World's End        4500   \n",
+       "2      880674609                                   Spectre        4466   \n",
+       "3     1084939099                     The Dark Knight Rises        9106   \n",
+       "4      284139100                               John Carter        2124   \n",
+       "...          ...                                       ...         ...   \n",
+       "4764      600896                                The Signal         631   \n",
+       "4772      321952                            The Last Waltz          64   \n",
+       "4778       10000                              Down Terrace          26   \n",
+       "4787           0                                 Dry Spell           1   \n",
+       "4802      424760                                    Primer         658   \n",
+       "\n",
+       "                                                   cast  \\\n",
+       "0     [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...   \n",
+       "1     [{\"cast_id\": 4, \"character\": \"Captain Jack Spa...   \n",
+       "2     [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...   \n",
+       "3     [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...   \n",
+       "4     [{\"cast_id\": 5, \"character\": \"John Carter\", \"c...   \n",
+       "...                                                 ...   \n",
+       "4764  [{\"cast_id\": 1, \"character\": \"Nic Eastman\", \"c...   \n",
+       "4772  [{\"cast_id\": 1, \"character\": \"Himself\", \"credi...   \n",
+       "4778  [{\"cast_id\": 4, \"character\": \"Bill\", \"credit_i...   \n",
+       "4787  [{\"cast_id\": 4, \"character\": \"Sasha\", \"credit_...   \n",
+       "4802  [{\"cast_id\": 1, \"character\": \"Aaron\", \"credit_...   \n",
+       "\n",
+       "                    new_genres  year  \n",
+       "0             Action,Adventure  2009  \n",
+       "1            Adventure,Fantasy  2007  \n",
+       "2             Action,Adventure  2015  \n",
+       "3                 Action,Crime  2012  \n",
+       "4             Action,Adventure  2012  \n",
+       "...                        ...   ...  \n",
+       "4764  Thriller,Science Fiction  2014  \n",
+       "4772         Documentary,Music  1978  \n",
+       "4778              Drama,Action  2009  \n",
+       "4787            Comedy,Romance  2013  \n",
+       "4802     Science Fiction,Drama  2004  \n",
+       "\n",
+       "[1262 rows x 8 columns]"
+      ]
+     },
+     "execution_count": 35,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_3"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_3['cast'] = df_3['cast'].str.split(',').str[5]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_3['cast'] = df_3['cast'].str.split(':').str[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_3['cast'] = df_3['cast'].str.split('\"').str[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_3['production_companies'] = df_3['production_companies'].str.split(',').str[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_3['production_companies'] = df_3['production_companies'].str.split(':').str[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_3['production_companies'] = df_3['production_companies'].str.split('\"').str[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenvorbereitung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Der Code bearbeitet die Spalten 'cast' und 'production_companies', \n",
+    "um bestimmte Informationen zu extrahieren und zu trennen, wobei fehlende Werte in diesen Spalten identifiziert werden.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "popularity               0\n",
+       "production_companies    22\n",
+       "revenue                  0\n",
+       "title                    0\n",
+       "vote_count               0\n",
+       "cast                     5\n",
+       "new_genres               0\n",
+       "year                     0\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 42,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_3.isnull().sum()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_5 = df_3.dropna(axis=0)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "# 4. Modellierung und Evaluation"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.1 Test und Trainieren der Daten"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle listet Filme mit ihren Popularitätsbewertungen, Produktionsfirmen, Einnahmen, Titeln, \n",
+    "Stimmenanzahl, Hauptdarstellern, Genres und Erscheinungsjahren auf.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>production_companies</th>\n",
+       "      <th>revenue</th>\n",
+       "      <th>title</th>\n",
+       "      <th>vote_count</th>\n",
+       "      <th>cast</th>\n",
+       "      <th>new_genres</th>\n",
+       "      <th>year</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>150.437577</td>\n",
+       "      <td>Ingenious Film Partners</td>\n",
+       "      <td>2787965087</td>\n",
+       "      <td>Avatar</td>\n",
+       "      <td>11800</td>\n",
+       "      <td>Sam Worthington</td>\n",
+       "      <td>Action,Adventure</td>\n",
+       "      <td>2009</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>139.082615</td>\n",
+       "      <td>Walt Disney Pictures</td>\n",
+       "      <td>961000000</td>\n",
+       "      <td>Pirates of the Caribbean: At World's End</td>\n",
+       "      <td>4500</td>\n",
+       "      <td>Johnny Depp</td>\n",
+       "      <td>Adventure,Fantasy</td>\n",
+       "      <td>2007</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>107.376788</td>\n",
+       "      <td>Columbia Pictures</td>\n",
+       "      <td>880674609</td>\n",
+       "      <td>Spectre</td>\n",
+       "      <td>4466</td>\n",
+       "      <td>Daniel Craig</td>\n",
+       "      <td>Action,Adventure</td>\n",
+       "      <td>2015</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>112.312950</td>\n",
+       "      <td>Legendary Pictures</td>\n",
+       "      <td>1084939099</td>\n",
+       "      <td>The Dark Knight Rises</td>\n",
+       "      <td>9106</td>\n",
+       "      <td>Christian Bale</td>\n",
+       "      <td>Action,Crime</td>\n",
+       "      <td>2012</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>43.926995</td>\n",
+       "      <td>Walt Disney Pictures</td>\n",
+       "      <td>284139100</td>\n",
+       "      <td>John Carter</td>\n",
+       "      <td>2124</td>\n",
+       "      <td>Taylor Kitsch</td>\n",
+       "      <td>Action,Adventure</td>\n",
+       "      <td>2012</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   popularity     production_companies     revenue  \\\n",
+       "0  150.437577  Ingenious Film Partners  2787965087   \n",
+       "1  139.082615     Walt Disney Pictures   961000000   \n",
+       "2  107.376788        Columbia Pictures   880674609   \n",
+       "3  112.312950       Legendary Pictures  1084939099   \n",
+       "4   43.926995     Walt Disney Pictures   284139100   \n",
+       "\n",
+       "                                      title  vote_count             cast  \\\n",
+       "0                                    Avatar       11800  Sam Worthington   \n",
+       "1  Pirates of the Caribbean: At World's End        4500      Johnny Depp   \n",
+       "2                                   Spectre        4466     Daniel Craig   \n",
+       "3                     The Dark Knight Rises        9106   Christian Bale   \n",
+       "4                               John Carter        2124    Taylor Kitsch   \n",
+       "\n",
+       "          new_genres  year  \n",
+       "0   Action,Adventure  2009  \n",
+       "1  Adventure,Fantasy  2007  \n",
+       "2   Action,Adventure  2015  \n",
+       "3       Action,Crime  2012  \n",
+       "4   Action,Adventure  2012  "
+      ]
+     },
+     "execution_count": 44,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_5.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_6 = df_5.rename({\"cast\":\"star\"}, axis=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle enthält Informationen über den Erfolg von Filmen, darunter Popularität, \n",
+    "Produktionsunternehmen, Einnahmen, Titel, Stimmenzahl, Hauptdarsteller, Genres und Jahr der Veröffentlichung."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>production_companies</th>\n",
+       "      <th>revenue</th>\n",
+       "      <th>title</th>\n",
+       "      <th>vote_count</th>\n",
+       "      <th>star</th>\n",
+       "      <th>new_genres</th>\n",
+       "      <th>year</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>150.437577</td>\n",
+       "      <td>Ingenious Film Partners</td>\n",
+       "      <td>2787965087</td>\n",
+       "      <td>Avatar</td>\n",
+       "      <td>11800</td>\n",
+       "      <td>Sam Worthington</td>\n",
+       "      <td>Action,Adventure</td>\n",
+       "      <td>2009</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>139.082615</td>\n",
+       "      <td>Walt Disney Pictures</td>\n",
+       "      <td>961000000</td>\n",
+       "      <td>Pirates of the Caribbean: At World's End</td>\n",
+       "      <td>4500</td>\n",
+       "      <td>Johnny Depp</td>\n",
+       "      <td>Adventure,Fantasy</td>\n",
+       "      <td>2007</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>107.376788</td>\n",
+       "      <td>Columbia Pictures</td>\n",
+       "      <td>880674609</td>\n",
+       "      <td>Spectre</td>\n",
+       "      <td>4466</td>\n",
+       "      <td>Daniel Craig</td>\n",
+       "      <td>Action,Adventure</td>\n",
+       "      <td>2015</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>112.312950</td>\n",
+       "      <td>Legendary Pictures</td>\n",
+       "      <td>1084939099</td>\n",
+       "      <td>The Dark Knight Rises</td>\n",
+       "      <td>9106</td>\n",
+       "      <td>Christian Bale</td>\n",
+       "      <td>Action,Crime</td>\n",
+       "      <td>2012</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>43.926995</td>\n",
+       "      <td>Walt Disney Pictures</td>\n",
+       "      <td>284139100</td>\n",
+       "      <td>John Carter</td>\n",
+       "      <td>2124</td>\n",
+       "      <td>Taylor Kitsch</td>\n",
+       "      <td>Action,Adventure</td>\n",
+       "      <td>2012</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   popularity     production_companies     revenue  \\\n",
+       "0  150.437577  Ingenious Film Partners  2787965087   \n",
+       "1  139.082615     Walt Disney Pictures   961000000   \n",
+       "2  107.376788        Columbia Pictures   880674609   \n",
+       "3  112.312950       Legendary Pictures  1084939099   \n",
+       "4   43.926995     Walt Disney Pictures   284139100   \n",
+       "\n",
+       "                                      title  vote_count             star  \\\n",
+       "0                                    Avatar       11800  Sam Worthington   \n",
+       "1  Pirates of the Caribbean: At World's End        4500      Johnny Depp   \n",
+       "2                                   Spectre        4466     Daniel Craig   \n",
+       "3                     The Dark Knight Rises        9106   Christian Bale   \n",
+       "4                               John Carter        2124    Taylor Kitsch   \n",
+       "\n",
+       "          new_genres  year  \n",
+       "0   Action,Adventure  2009  \n",
+       "1  Adventure,Fantasy  2007  \n",
+       "2   Action,Adventure  2015  \n",
+       "3       Action,Crime  2012  \n",
+       "4   Action,Adventure  2012  "
+      ]
+     },
+     "execution_count": 46,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_6.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_7 = df_6.drop(['year'], axis = 1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_cleaned = df_7.dropna(axis=0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Ausgabe zeigt ein Array von Diagrammen, das die Verteilung der Popularität, \n",
+    "Einnahmen und Stimmenzahl der Filme in der Tabelle visualisiert.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[<AxesSubplot:title={'center':'popularity'}>,\n",
+       "        <AxesSubplot:title={'center':'revenue'}>],\n",
+       "       [<AxesSubplot:title={'center':'vote_count'}>, <AxesSubplot:>]],\n",
+       "      dtype=object)"
+      ]
+     },
+     "execution_count": 49,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": "",
+      "text/plain": [
+       "<Figure size 720x720 with 4 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "df_cleaned.hist(figsize=(10,10), bins=50)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Diese Meldung warnt davor, dass die Funktion distplot in zukünftigen Versionen von Seaborn nicht mehr unterstützt wird \n",
+    "und stattdessen displot oder histplot verwendet werden sollte.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/Jumana/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:2551: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n",
+      "  warnings.warn(msg, FutureWarning)\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<AxesSubplot:xlabel='popularity', ylabel='Density'>"
+      ]
+     },
+     "execution_count": 50,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": "",
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "sns.distplot(df_cleaned['popularity'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "q = df_cleaned['popularity'].quantile(0.99)\n",
+    "\n",
+    "data_1 = df_cleaned[df_cleaned['popularity']<q]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Meldung weist darauf hin, dass die Funktion distplot in zukünftigen Versionen nicht mehr unterstützt wird \n",
+    "und empfiehlt die Verwendung von displot oder histplot als Alternative."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/Jumana/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:2551: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n",
+      "  warnings.warn(msg, FutureWarning)\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<AxesSubplot:xlabel='popularity', ylabel='Density'>"
+      ]
+     },
+     "execution_count": 52,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": "",
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "sns.distplot(data_1['popularity'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "q = data_1['vote_count'].quantile(0.01)\n",
+    "\n",
+    "data_2 = data_1[data_1['vote_count']>q]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle liefert statistische Informationen über die Popularität, Einnahmen und Stimmenzahl von Filmen, \n",
+    "einschließlich der Anzahl der Datensätze, Durchschnittswerte, Standardabweichungen und Quartile."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>revenue</th>\n",
+       "      <th>vote_count</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>count</th>\n",
+       "      <td>1208.000000</td>\n",
+       "      <td>1.208000e+03</td>\n",
+       "      <td>1208.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>mean</th>\n",
+       "      <td>35.506440</td>\n",
+       "      <td>1.669286e+08</td>\n",
+       "      <td>1405.486755</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>std</th>\n",
+       "      <td>29.940760</td>\n",
+       "      <td>2.455862e+08</td>\n",
+       "      <td>1752.879308</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>min</th>\n",
+       "      <td>0.132878</td>\n",
+       "      <td>0.000000e+00</td>\n",
+       "      <td>9.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>25%</th>\n",
+       "      <td>13.768277</td>\n",
+       "      <td>1.529117e+07</td>\n",
+       "      <td>279.250000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>50%</th>\n",
+       "      <td>27.153374</td>\n",
+       "      <td>7.233776e+07</td>\n",
+       "      <td>712.000000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>75%</th>\n",
+       "      <td>46.832955</td>\n",
+       "      <td>2.054128e+08</td>\n",
+       "      <td>1805.500000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>max</th>\n",
+       "      <td>167.932870</td>\n",
+       "      <td>2.787965e+09</td>\n",
+       "      <td>13752.000000</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "        popularity       revenue    vote_count\n",
+       "count  1208.000000  1.208000e+03   1208.000000\n",
+       "mean     35.506440  1.669286e+08   1405.486755\n",
+       "std      29.940760  2.455862e+08   1752.879308\n",
+       "min       0.132878  0.000000e+00      9.000000\n",
+       "25%      13.768277  1.529117e+07    279.250000\n",
+       "50%      27.153374  7.233776e+07    712.000000\n",
+       "75%      46.832955  2.054128e+08   1805.500000\n",
+       "max     167.932870  2.787965e+09  13752.000000"
+      ]
+     },
+     "execution_count": 54,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_2.describe()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Meldung informiert darüber, dass die Funktion distplot in Zukunft nicht mehr unterstützt wird \n",
+    "und empfiehlt stattdessen die Verwendung von displot oder histplot."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/Jumana/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:2551: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n",
+      "  warnings.warn(msg, FutureWarning)\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<AxesSubplot:xlabel='vote_count', ylabel='Density'>"
+      ]
+     },
+     "execution_count": 55,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": "",
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "sns.distplot(data_2['vote_count'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 56,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#q = data_2['revenue'].quantile(0.99)\n",
+    "\n",
+    "data_3 = data_2[data_2['revenue']<1.5e+09]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Ausgabe zeigt ein Array von Diagrammen, das die Verteilung der Popularität, \n",
+    "Einnahmen und Stimmenzahl von Filmen visualisiert."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[<AxesSubplot:title={'center':'popularity'}>,\n",
+       "        <AxesSubplot:title={'center':'revenue'}>],\n",
+       "       [<AxesSubplot:title={'center':'vote_count'}>, <AxesSubplot:>]],\n",
+       "      dtype=object)"
+      ]
+     },
+     "execution_count": 57,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": "",
+      "text/plain": [
+       "<Figure size 1800x1800 with 4 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "data_3.hist(figsize=(25,25), bins=50)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 58,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_final = data_3.reset_index(drop=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_final['revenue'] = data_final['revenue'].astype(float)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle enthält Informationen über die Popularität, Produktionsunternehmen, Einnahmen, Titel, \n",
+    "Stimmenanzahl, Hauptdarsteller und Genres von verschiedenen Filmen."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>production_companies</th>\n",
+       "      <th>revenue</th>\n",
+       "      <th>title</th>\n",
+       "      <th>vote_count</th>\n",
+       "      <th>star</th>\n",
+       "      <th>new_genres</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>139.082615</td>\n",
+       "      <td>Walt Disney Pictures</td>\n",
+       "      <td>9.610000e+08</td>\n",
+       "      <td>Pirates of the Caribbean: At World's End</td>\n",
+       "      <td>4500</td>\n",
+       "      <td>Johnny Depp</td>\n",
+       "      <td>Adventure,Fantasy</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>107.376788</td>\n",
+       "      <td>Columbia Pictures</td>\n",
+       "      <td>8.806746e+08</td>\n",
+       "      <td>Spectre</td>\n",
+       "      <td>4466</td>\n",
+       "      <td>Daniel Craig</td>\n",
+       "      <td>Action,Adventure</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>112.312950</td>\n",
+       "      <td>Legendary Pictures</td>\n",
+       "      <td>1.084939e+09</td>\n",
+       "      <td>The Dark Knight Rises</td>\n",
+       "      <td>9106</td>\n",
+       "      <td>Christian Bale</td>\n",
+       "      <td>Action,Crime</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>43.926995</td>\n",
+       "      <td>Walt Disney Pictures</td>\n",
+       "      <td>2.841391e+08</td>\n",
+       "      <td>John Carter</td>\n",
+       "      <td>2124</td>\n",
+       "      <td>Taylor Kitsch</td>\n",
+       "      <td>Action,Adventure</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>115.699814</td>\n",
+       "      <td>Columbia Pictures</td>\n",
+       "      <td>8.908716e+08</td>\n",
+       "      <td>Spider-Man 3</td>\n",
+       "      <td>3576</td>\n",
+       "      <td>Tobey Maguire</td>\n",
+       "      <td>Fantasy,Action</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   popularity  production_companies       revenue  \\\n",
+       "0  139.082615  Walt Disney Pictures  9.610000e+08   \n",
+       "1  107.376788     Columbia Pictures  8.806746e+08   \n",
+       "2  112.312950    Legendary Pictures  1.084939e+09   \n",
+       "3   43.926995  Walt Disney Pictures  2.841391e+08   \n",
+       "4  115.699814     Columbia Pictures  8.908716e+08   \n",
+       "\n",
+       "                                      title  vote_count            star  \\\n",
+       "0  Pirates of the Caribbean: At World's End        4500     Johnny Depp   \n",
+       "1                                   Spectre        4466    Daniel Craig   \n",
+       "2                     The Dark Knight Rises        9106  Christian Bale   \n",
+       "3                               John Carter        2124   Taylor Kitsch   \n",
+       "4                              Spider-Man 3        3576   Tobey Maguire   \n",
+       "\n",
+       "          new_genres  \n",
+       "0  Adventure,Fantasy  \n",
+       "1   Action,Adventure  \n",
+       "2       Action,Crime  \n",
+       "3   Action,Adventure  \n",
+       "4     Fantasy,Action  "
+      ]
+     },
+     "execution_count": 60,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_final.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle gibt statistische Zusammenfassungen über die Merkmale \"Popularität\", \"Produktionsunternehmen\", \n",
+    "\"Einnahmen\", \"Titel\", \"Stimmenanzahl\", \"Hauptdarsteller\" und \"Genres\" von Filmen an, \n",
+    "einschließlich Anzahl der Datensätze, eindeutiger Werte, häufigster Wert, Mittelwerte, \n",
+    "Standardabweichungen, und Quartile.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>production_companies</th>\n",
+       "      <th>revenue</th>\n",
+       "      <th>title</th>\n",
+       "      <th>vote_count</th>\n",
+       "      <th>star</th>\n",
+       "      <th>new_genres</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>count</th>\n",
+       "      <td>1205.000000</td>\n",
+       "      <td>1205</td>\n",
+       "      <td>1.205000e+03</td>\n",
+       "      <td>1205</td>\n",
+       "      <td>1205.000000</td>\n",
+       "      <td>1205</td>\n",
+       "      <td>1205</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>unique</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>413</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1204</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>659</td>\n",
+       "      <td>132</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>top</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Universal Pictures</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>The Host</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Matt Damon</td>\n",
+       "      <td>Comedy,Drama</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>freq</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>91</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>14</td>\n",
+       "      <td>73</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>mean</th>\n",
+       "      <td>35.267109</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.622383e+08</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1383.145228</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>std</th>\n",
+       "      <td>29.569232</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2.255582e+08</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1693.870514</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>min</th>\n",
+       "      <td>0.132878</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.000000e+00</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>9.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>25%</th>\n",
+       "      <td>13.707843</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.525000e+07</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>277.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>50%</th>\n",
+       "      <td>27.082182</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>7.210861e+07</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>705.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>75%</th>\n",
+       "      <td>46.630062</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2.034276e+08</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1798.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>max</th>\n",
+       "      <td>167.932870</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.405404e+09</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>13752.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "         popularity production_companies       revenue     title  \\\n",
+       "count   1205.000000                 1205  1.205000e+03      1205   \n",
+       "unique          NaN                  413           NaN      1204   \n",
+       "top             NaN   Universal Pictures           NaN  The Host   \n",
+       "freq            NaN                   91           NaN         2   \n",
+       "mean      35.267109                  NaN  1.622383e+08       NaN   \n",
+       "std       29.569232                  NaN  2.255582e+08       NaN   \n",
+       "min        0.132878                  NaN  0.000000e+00       NaN   \n",
+       "25%       13.707843                  NaN  1.525000e+07       NaN   \n",
+       "50%       27.082182                  NaN  7.210861e+07       NaN   \n",
+       "75%       46.630062                  NaN  2.034276e+08       NaN   \n",
+       "max      167.932870                  NaN  1.405404e+09       NaN   \n",
+       "\n",
+       "          vote_count        star    new_genres  \n",
+       "count    1205.000000        1205          1205  \n",
+       "unique           NaN         659           132  \n",
+       "top              NaN  Matt Damon  Comedy,Drama  \n",
+       "freq             NaN          14            73  \n",
+       "mean     1383.145228         NaN           NaN  \n",
+       "std      1693.870514         NaN           NaN  \n",
+       "min         9.000000         NaN           NaN  \n",
+       "25%       277.000000         NaN           NaN  \n",
+       "50%       705.000000         NaN           NaN  \n",
+       "75%      1798.000000         NaN           NaN  \n",
+       "max     13752.000000         NaN           NaN  "
+      ]
+     },
+     "execution_count": 61,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_final.describe(include='all')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle zeigt, dass es keine fehlenden Werte für die Merkmale \n",
+    "\"Popularität\", \"Produktionsunternehmen\", \"Einnahmen\", \"Titel\", \"Stimmenanzahl\", \"Hauptdarsteller\" und \"Genres\" gibt."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 62,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "popularity              0\n",
+       "production_companies    0\n",
+       "revenue                 0\n",
+       "title                   0\n",
+       "vote_count              0\n",
+       "star                    0\n",
+       "new_genres              0\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 62,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_final.isnull().sum()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 63,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array(['popularity', 'production_companies', 'revenue', 'title',\n",
+       "       'vote_count', 'star', 'new_genres'], dtype=object)"
+      ]
+     },
+     "execution_count": 63,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_final.columns.values"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Der Code definiert ein Pandas DataFrame mit 1205 Einträgen und 7 Spalten, \n",
+    "die verschiedene Datentypen wie Float, Integer und Object enthalten."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 64,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "RangeIndex: 1205 entries, 0 to 1204\n",
+      "Data columns (total 7 columns):\n",
+      " #   Column                Non-Null Count  Dtype  \n",
+      "---  ------                --------------  -----  \n",
+      " 0   popularity            1205 non-null   float64\n",
+      " 1   production_companies  1205 non-null   object \n",
+      " 2   revenue               1205 non-null   float64\n",
+      " 3   title                 1205 non-null   object \n",
+      " 4   vote_count            1205 non-null   int64  \n",
+      " 5   star                  1205 non-null   object \n",
+      " 6   new_genres            1205 non-null   object \n",
+      "dtypes: float64(2), int64(1), object(4)\n",
+      "memory usage: 66.0+ KB\n"
+     ]
+    }
+   ],
+   "source": [
+    "data_final.info()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Der Code berechnet die VIF (Variance Inflation Factor) für numerische Variablen wie \n",
+    "\"Einnahmen\" und \"Stimmenanzahl\" \n",
+    "aus einem DataFrame und speichert die Ergebnisse zusammen mit den Variablennamen in einem neuen DataFrame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 65,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from statsmodels.stats.outliers_influence import variance_inflation_factor\n",
+    "\n",
+    "# Since categorical data is not preprocessed, take only the numerical data.\n",
+    "variables = data_final[['revenue', 'vote_count']]\n",
+    "\n",
+    "# Create a new data frame which includes all VIFs (Variance Inflation Factor)\n",
+    "# Each variable has its own variance inflation factor. This measure is variable specific\n",
+    "vif = pd.DataFrame()\n",
+    "\n",
+    "# Make use of the variance_inflation_factor module, output the respective VIFs \n",
+    "vif[\"VIF\"] = [variance_inflation_factor(variables.values, i) for i in range(variables.shape[1])]\n",
+    "\n",
+    "# Include variable names so it is easier to explore the result\n",
+    "vif[\"Features\"] = variables.columns"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle zeigt die berechneten VIF (Variance Inflation Factor) Werte für die Merkmale \n",
+    "\"Einnahmen\" und \"Stimmenanzahl\", sowie die zugehörigen Merkmalsnamen.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 66,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>VIF</th>\n",
+       "      <th>Features</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>3.741807</td>\n",
+       "      <td>revenue</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>3.741807</td>\n",
+       "      <td>vote_count</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "        VIF    Features\n",
+       "0  3.741807     revenue\n",
+       "1  3.741807  vote_count"
+      ]
+     },
+     "execution_count": 66,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Explore the result\n",
+    "vif"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 67,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Remove 'Year' as feature with the highest VIF from the model\n",
+    "data_final = data_final.drop(['star', 'production_companies'],axis=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle bietet eine statistische Zusammenfassung der Merkmale \"Popularität\", \"Einnahmen\", \"Titel\", \"Stimmenanzahl\" und \"Genres\" von Filmen, einschließlich der Anzahl der Datensätze, der eindeutigen Werte, des am häufigsten auftretenden Titels, der Durchschnittswerte,\n",
+    "der Standardabweichungen und der Quartile."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>revenue</th>\n",
+       "      <th>title</th>\n",
+       "      <th>vote_count</th>\n",
+       "      <th>new_genres</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>count</th>\n",
+       "      <td>1205.000000</td>\n",
+       "      <td>1.205000e+03</td>\n",
+       "      <td>1205</td>\n",
+       "      <td>1205.000000</td>\n",
+       "      <td>1205</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>unique</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1204</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>132</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>top</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>The Host</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Comedy,Drama</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>freq</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>73</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>mean</th>\n",
+       "      <td>35.267109</td>\n",
+       "      <td>1.622383e+08</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1383.145228</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>std</th>\n",
+       "      <td>29.569232</td>\n",
+       "      <td>2.255582e+08</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1693.870514</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>min</th>\n",
+       "      <td>0.132878</td>\n",
+       "      <td>0.000000e+00</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>9.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>25%</th>\n",
+       "      <td>13.707843</td>\n",
+       "      <td>1.525000e+07</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>277.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>50%</th>\n",
+       "      <td>27.082182</td>\n",
+       "      <td>7.210861e+07</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>705.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>75%</th>\n",
+       "      <td>46.630062</td>\n",
+       "      <td>2.034276e+08</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1798.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>max</th>\n",
+       "      <td>167.932870</td>\n",
+       "      <td>1.405404e+09</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>13752.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "         popularity       revenue     title    vote_count    new_genres\n",
+       "count   1205.000000  1.205000e+03      1205   1205.000000          1205\n",
+       "unique          NaN           NaN      1204           NaN           132\n",
+       "top             NaN           NaN  The Host           NaN  Comedy,Drama\n",
+       "freq            NaN           NaN         2           NaN            73\n",
+       "mean      35.267109  1.622383e+08       NaN   1383.145228           NaN\n",
+       "std       29.569232  2.255582e+08       NaN   1693.870514           NaN\n",
+       "min        0.132878  0.000000e+00       NaN      9.000000           NaN\n",
+       "25%       13.707843  1.525000e+07       NaN    277.000000           NaN\n",
+       "50%       27.082182  7.210861e+07       NaN    705.000000           NaN\n",
+       "75%       46.630062  2.034276e+08       NaN   1798.000000           NaN\n",
+       "max      167.932870  1.405404e+09       NaN  13752.000000           NaN"
+      ]
+     },
+     "execution_count": 68,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_final.describe(include='all')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_with_dummies = pd.get_dummies(data_final, drop_first=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Tabelle enthält binäre Indikatoren für das Vorhandensein bestimmter Filme in den Spalten \n",
+    "\"Titel\" und Genres, zusammen mit numerischen Daten wie \"Popularität\", \n",
+    "\"Einnahmen\" und \"Stimmenanzahl\" für jede dieser Filme.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 70,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>popularity</th>\n",
+       "      <th>revenue</th>\n",
+       "      <th>vote_count</th>\n",
+       "      <th>title_(500) Days of Summer</th>\n",
+       "      <th>title_10 Cloverfield Lane</th>\n",
+       "      <th>title_12 Rounds</th>\n",
+       "      <th>title_13 Hours: The Secret Soldiers of Benghazi</th>\n",
+       "      <th>title_1408</th>\n",
+       "      <th>title_1911</th>\n",
+       "      <th>title_2 Guns</th>\n",
+       "      <th>...</th>\n",
+       "      <th>new_genres_Thriller,Crime</th>\n",
+       "      <th>new_genres_Thriller,Documentary</th>\n",
+       "      <th>new_genres_Thriller,Drama</th>\n",
+       "      <th>new_genres_Thriller,Horror</th>\n",
+       "      <th>new_genres_Thriller,Mystery</th>\n",
+       "      <th>new_genres_Thriller,Science Fiction</th>\n",
+       "      <th>new_genres_War,Action</th>\n",
+       "      <th>new_genres_War,Crime</th>\n",
+       "      <th>new_genres_War,Drama</th>\n",
+       "      <th>new_genres_Western,Drama</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>139.082615</td>\n",
+       "      <td>9.610000e+08</td>\n",
+       "      <td>4500</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>107.376788</td>\n",
+       "      <td>8.806746e+08</td>\n",
+       "      <td>4466</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>112.312950</td>\n",
+       "      <td>1.084939e+09</td>\n",
+       "      <td>9106</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>43.926995</td>\n",
+       "      <td>2.841391e+08</td>\n",
+       "      <td>2124</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>115.699814</td>\n",
+       "      <td>8.908716e+08</td>\n",
+       "      <td>3576</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>5 rows × 1337 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   popularity       revenue  vote_count  title_(500) Days of Summer  \\\n",
+       "0  139.082615  9.610000e+08        4500                           0   \n",
+       "1  107.376788  8.806746e+08        4466                           0   \n",
+       "2  112.312950  1.084939e+09        9106                           0   \n",
+       "3   43.926995  2.841391e+08        2124                           0   \n",
+       "4  115.699814  8.908716e+08        3576                           0   \n",
+       "\n",
+       "   title_10 Cloverfield Lane  title_12 Rounds  \\\n",
+       "0                          0                0   \n",
+       "1                          0                0   \n",
+       "2                          0                0   \n",
+       "3                          0                0   \n",
+       "4                          0                0   \n",
+       "\n",
+       "   title_13 Hours: The Secret Soldiers of Benghazi  title_1408  title_1911  \\\n",
+       "0                                                0           0           0   \n",
+       "1                                                0           0           0   \n",
+       "2                                                0           0           0   \n",
+       "3                                                0           0           0   \n",
+       "4                                                0           0           0   \n",
+       "\n",
+       "   title_2 Guns  ...  new_genres_Thriller,Crime  \\\n",
+       "0             0  ...                          0   \n",
+       "1             0  ...                          0   \n",
+       "2             0  ...                          0   \n",
+       "3             0  ...                          0   \n",
+       "4             0  ...                          0   \n",
+       "\n",
+       "   new_genres_Thriller,Documentary  new_genres_Thriller,Drama  \\\n",
+       "0                                0                          0   \n",
+       "1                                0                          0   \n",
+       "2                                0                          0   \n",
+       "3                                0                          0   \n",
+       "4                                0                          0   \n",
+       "\n",
+       "   new_genres_Thriller,Horror  new_genres_Thriller,Mystery  \\\n",
+       "0                           0                            0   \n",
+       "1                           0                            0   \n",
+       "2                           0                            0   \n",
+       "3                           0                            0   \n",
+       "4                           0                            0   \n",
+       "\n",
+       "   new_genres_Thriller,Science Fiction  new_genres_War,Action  \\\n",
+       "0                                    0                      0   \n",
+       "1                                    0                      0   \n",
+       "2                                    0                      0   \n",
+       "3                                    0                      0   \n",
+       "4                                    0                      0   \n",
+       "\n",
+       "   new_genres_War,Crime  new_genres_War,Drama  new_genres_Western,Drama  \n",
+       "0                     0                     0                         0  \n",
+       "1                     0                     0                         0  \n",
+       "2                     0                     0                         0  \n",
+       "3                     0                     0                         0  \n",
+       "4                     0                     0                         0  \n",
+       "\n",
+       "[5 rows x 1337 columns]"
+      ]
+     },
+     "execution_count": 70,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_with_dummies.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 71,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "target = data_with_dummies['popularity']\n",
+    "predictors = data_with_dummies.drop(['popularity'],axis=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 72,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 80-20 split into training and test data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.2, random_state=123)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Der Code standardisiert die Merkmale eines Trainingsdatensatzes \n",
+    "und wendet dieselbe Transformation auf Trainings- und Testdaten an.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 73,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scaler = StandardScaler()\n",
+    "scaler.fit(X_train)\n",
+    "\n",
+    "X_train = scaler.transform(X_train)\n",
+    "X_test = scaler.transform(X_test)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Der Code erstellt ein lineares Regressionsmodell und passt es an den Trainingsdatensatz an."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 74,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "LinearRegression()"
+      ]
+     },
+     "execution_count": 74,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "reg = LinearRegression()\n",
+    "reg.fit(X_train,y_train)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Der Code berechnet die Leistung eines linearen Regressionsmodells auf dem Trainingsdatensatz \n",
+    "und dem Testdatensatz und gibt die Ergebnisse aus."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 75,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training performance\n",
+      "1.0\n",
+      "test performance\n",
+      "0.5546065264388957\n"
+     ]
+    }
+   ],
+   "source": [
+    "print('training performance')\n",
+    "print(reg.score(X_train,y_train))\n",
+    "print('test performance')\n",
+    "print(reg.score(X_test,y_test))"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.2 Lineare Regression"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Der Code führt eine Vorhersage basierend auf einem Modell durch, vergleicht die Vorhersagen mit den tatsächlichen Werten \n",
+    "und visualisiert die Ergebnisse durch Diagramme und Streudiagramme mit einer Regressionslinie.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 76,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": "",
+      "text/plain": [
+       "<Figure size 1152x576 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "image/png": "",
+      "text/plain": [
+       "<Figure size 432x432 with 3 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "y_pred = reg.predict(X_test)\n",
+    "test = pd.DataFrame({'Predicted':y_pred,'Actual':y_test})\n",
+    "fig= plt.figure(figsize=(16,8))\n",
+    "test = test.reset_index()\n",
+    "test = test.drop(['index'],axis=1)\n",
+    "plt.plot(test[:50])\n",
+    "plt.legend(['Actual','Predicted'])\n",
+    "sns.jointplot(x='Actual',y='Predicted',data=test,kind='reg',);"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Evaluation",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# 5 Evaluation  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Evaluation",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Die Analyse des Datensatzes aus dem tmdb-Dataset zur Verbesserung der Filmempfehlungen auf Netflix legt nahe, \n",
+    "dass eine Vielzahl von Faktoren die Beliebtheit oder Bewertungen der Filme beeinflussen können.\n",
+    "Dazu gehören sowohl numerische als auch kategoriale Daten wie Crewmitglieder, Besetzung und eindeutige IDs. \n",
+    "Durch die Anwendung von maschinellem Lernen auf diese Daten können Strategien entwickelt werden, \n",
+    "um die Kundenzufriedenheit zu erhöhen und Abbruchquoten zu senken. \n",
+    "Eine gründliche Analyse und Modellierung dieser Faktoren ermöglicht es, personalisierte Empfehlungen zu generieren \n",
+    "und das Nutzererlebnis zu verbessern. Das Fazit dieser Umsetzung ist, dass die Daten eine solide Grundlage bieten, \n",
+    "um innovative Lösungen zur Filmempfehlung zu entwickeln, die die Zufriedenheit der Netflix-Nutzer steigern können."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Umsetzung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# 6 Umsetzung "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Umsetzung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "Um die Abbruchquoten bei Netflix zu senken und die Kundenzufriedenheit zu steigern, können maschinelles Lernen und Datenanalyse genutzt werden, um relevante Faktoren wie Crewmitglieder, Besetzung und Genre zu identifizieren und ein Modell zu trainieren, das die Beliebtheit oder Bewertungen von Filmen vorhersagt,\n",
+    "was zu personalisierten Empfehlungen führt."
+   ]
+  }
+ ],
+ "metadata": {
+  "category": "CRM",
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.2"
+  },
+  "skipNotebookInDeployment": false,
+  "title": "Increase customer satisfaction"
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}