diff --git a/CRM/Customer Churn Prediction/notebook.ipynb b/CRM/Customer Churn Prediction/notebook.ipynb index 4ef48e358b89d7df45815621859957ed64bab2c6..698e464af50e53fe910401defe3e689fc119b2ba 100644 --- a/CRM/Customer Churn Prediction/notebook.ipynb +++ b/CRM/Customer Churn Prediction/notebook.ipynb @@ -1,2220 +1,34 @@ { "cells": [ { + "attachments": {}, "cell_type": "markdown", "metadata": { "editable": true, + "include": true, + "paragraph": "BusinessUnderstanding", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ - "## Geschäftsverständnis\n", - "Die Analyse zielt darauf ab, die Kundenabwanderung im Telekommunikationssektor vorherzusagen, um proaktiv Maßnahmen zur Kundenbindung zu ergreifen und potenzielle Abwanderungen zu minimieren. Dies trägt zur langfristigen Stabilität und Rentabilität des Unternehmens bei.\n", - "\n", - "## Datenverständnis\n", - "Im geschäftlichen Kontext basiert die Analyse auf dem Telco-Customer-Churn-Datensatz https://www.kaggle.com/datasets/blastchar/telco-customer-churn/suggestions?status=pending&yourSuggestions=true von Kaggle aus dem Jahr 2019, der von einem Telekommunikationsunternehmen bereitgestellt wurde und Kundenmerkmale sowie Vertragsinformationen umfasst, um die Kundenabwanderung vorherzusagen.\n", - "Der Datensatz enthält eine Vielzahl von Kundenattributen, darunter Vertragsdetails, demografische Merkmale und Service-Informationen. Mit insgesamt 21 Attributen bietet der Datensatz Einblicke in das Kundenverhalten und die Gründe für eine mögliche Abwanderung. Eine sorgfältige Analyse der Daten zeigt, dass die Dauer der Betriebszugehörigkeit (\"tenure\") und die Gesamtkosten (\"TotalCharges\") eine hohe positive Korrelation aufweisen, was darauf hinweist, dass Kunden mit längerer Betriebszugehörigkeit tendenziell höhere Gesamtkosten haben.\n", - "\n", - "## Datenaufbereitung\n", - "Die Datenaufbereitung ist ein kritischer Schritt, um sicherzustellen, dass die Daten für die Modellierung geeignet sind. \n", - "Dazu gehört die Reduzierung von Dimensionen, die Behandlung von fehlenden Daten und Ausreißern sowie die Umwandlung kategorialer Merkmale in numerische Werte. \n", - "Besonderes Augenmerk wird auch auf die Überprüfung von Multikollinearität gelegt, um sicherzustellen, dass die unabhängigen Variablen nicht stark miteinander korreliert sind.\n", - "\n", - "## Modellierung und Auswertung\n", - "\n", - "Für die Vorhersage der Kundenabwanderung wird eine logistische Regressionsanalyse durchgeführt. \n", - "Die logistische Regression ist ein leistungsstarkes statistisches Modell, das die Beziehung zwischen einer binären Zielvariable (in diesem Fall \"Churn\") und den unabhängigen Variablen modelliert. \n", - "Durch die Festlegung eines Schwellenwerts wird bestimmt, ob ein Kunde voraussichtlich abwandert oder nicht.\n", + "# 1. Business Understanding\n", "\n", - "## Bereitstellung \n", - "Das Modell hat das Potenzial, in bestehende Geschäftsprozesse integriert zu werden, um Unternehmen dabei zu unterstützen, proaktiv auf Kundenabwanderung zu reagieren. \n", - "Indem sie Kunden identifizieren, die voraussichtlich abwandern werden, können Unternehmen gezielte Maßnahmen ergreifen, um diese Kunden zu halten und die Kundenbindung zu stärken. \n", - "Dies kann dazu beitragen, Umsatzverluste zu minimieren und das langfristige Wachstum des Unternehmens zu fördern." + "Test Input Kundenabwanderung ist die Entscheidung eines Kunden, eine bestimmte Unternehmensdienstleistung nicht mehr zu kaufen. Sie stellt somit das Gegenstück zur langfristigen Kundenbindung dar. Um die Kundenbindung zu fördern, müssen Unternehmen Analysen einsetzen, die frühzeitig erkennen, ob ein Kunde das Unternehmen verlassen will. So können Marketing- und Vertriebsmaßnahmen eingeleitet werden, bevor es zum eigentlichen Kundenverlust kommt. In diesem Zusammenhang beantwortet der Service konkret diese beiden Fragen: Wie hoch ist die Wahrscheinlichkeit, dass anhand historischer Daten vorhergesagt werden kann, ob ein Kunde zu einem anderen Anbieter abwandert? Welche Faktoren führen zur Kundenabwanderung?" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { + "editable": true, "include": true, - "paragraph": "BusinessUnderstanding" - }, - "source": [ - "# 1. Business Understanding\n", - "\n", - "Kundenabwanderung ist die Entscheidung eines Kunden, eine bestimmte Unternehmensdienstleistung nicht mehr zu kaufen. Sie stellt somit das Gegenstück zur langfristigen Kundenbindung dar. Um die Kundenbindung zu fördern, müssen Unternehmen Analysen einsetzen, die frühzeitig erkennen, ob ein Kunde das Unternehmen verlassen will. So können Marketing- und Vertriebsmaßnahmen eingeleitet werden, bevor es zum eigentlichen Kundenverlust kommt. In diesem Zusammenhang beantwortet der Service konkret diese beiden Fragen: Wie hoch ist die Wahrscheinlichkeit, dass anhand historischer Daten vorhergesagt werden kann, ob ein Kunde zu einem anderen Anbieter abwandert? Welche Faktoren führen zur Kundenabwanderung?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1. Business Understanding¶\n", - "Diese Fallstudie soll Technologieunternehmen bei der Entwicklung und Verbesserung von Komponenten für autonome Fahrzeuge unterstützen. Dabei wirft diese Fallstudie die Frage auf: In welchen Bereichen liegen die Schwächen der getesteten Systeme und wie lassen sich diese einordnen? Die Frage wird beantwortet, indem die Gründe für den Eingriff des menschlichen Testfahrers in das System klassifiziert werden. Dabei sollen die Gründe in verschiedene Kategorien eingeteilt werden:\n", - "- Software Probleme\n", - "- Hardware Probleme\n", - "- Software und Hardware Probleme (Probleme, die in Kombination auftreten)\n", - "- Probleme, die durch Objekte der Verkehrssteuerung verursacht werden (z. B. Fehlfunktionen von Lichtsignalanlagen)\n", - "- Probleme, die durch andere Verkehrsteilnehmer verursacht werden\n", - "- äußere Einflüsse (einschließlich z. B. Blockaden, verborgene Elemente, Wetter- und Straßenbedingungen)\n", - "- sonstige Probleme" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2. Daten und Datenverständnis \n", - "Der zugrundeliegende Datensatz wurde von Kaggle bezogen und zeigt notwendige Eingriffe eines Fahrers in autonom fahrende Fahrzeugtechnologie. Er wurde vom 1. Dezember 2018 bis zum 30. November 2019 aufgezeichnet und steht unter der Lizenz von U.S. Government Works. Die Datenverarbeitung erfolgt mit Hilfe eines Jupyter-Notebooks, das in der populärsten Data-Science-Plattform \"Anaconda\" enthalten ist. Der Datensatz liegt im Format 8885 x 9 vor. Die folgenden Daten wurden erfasst:\n", - "- Hersteller\n", - "- Zulassungsnummer\n", - "- Datum\n", - "- Fahrzeug-Identifikationsnummer\n", - "- Möglichkeit des Betriebs ohne Fahrer\n", - "- Ist ein Fahrer vor Ort?\n", - "- Unterbrechung des autonomen Fahrens durch AV-System, Testfahrer, Fernsteuerung oder Beifahrer\n", - "- Ort des Vorfalls: Autobahn, Schnellstraße, Landstraße, Straße oder Parkplatz\n", - "- Beschreibung der Ursachen" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.1. Import von relevanten Modulen" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import pandas as pd\n", - "import statsmodels.api as sm\n", - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "sns.set()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'1.3.4'" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "pd.__version__" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.2 Daten einlesen" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "raw_data = pd.read_csv('https://storage.googleapis.com/ml-service-repository-datastorage/Improvement_of_components_for_autonomous_motor_vehicles_data.csv')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Manufacturer</th>\n", - " <th>Permit Number</th>\n", - " <th>DATE</th>\n", - " <th>VIN NUMBER</th>\n", - " <th>VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\\n(Yes or No)</th>\n", - " <th>DRIVER PRESENT\\n(Yes or No)</th>\n", - " <th>DISENGAGEMENT INITIATED BY\\n(AV System, Test Driver, Remote Operator, or Passenger)</th>\n", - " <th>DISENGAGEMENT\\nLOCATION\\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility)</th>\n", - " <th>DESCRIPTION OF FACTS CAUSING DISENGAGEMENT</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.06.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.10.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.10.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>04.23.2019</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>05.14.2019</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver to the exit lane: risk of...</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Manufacturer Permit Number DATE VIN NUMBER \\\n", - "0 AImotive Inc. AVT003 12.06.2018 JTDKN3DU5A1092792 \n", - "1 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 \n", - "2 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 \n", - "3 AImotive Inc. AVT003 04.23.2019 JTDKN3DU5A1092792 \n", - "4 AImotive Inc. AVT003 05.14.2019 JTDKN3DU5A1092792 \n", - "\n", - " VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\\n(Yes or No) \\\n", - "0 No \n", - "1 No \n", - "2 No \n", - "3 No \n", - "4 No \n", - "\n", - " DRIVER PRESENT\\n(Yes or No) \\\n", - "0 Yes \n", - "1 Yes \n", - "2 Yes \n", - "3 Yes \n", - "4 Yes \n", - "\n", - " DISENGAGEMENT INITIATED BY\\n(AV System, Test Driver, Remote Operator, or Passenger) \\\n", - "0 Test Driver \n", - "1 Test Driver \n", - "2 Test Driver \n", - "3 Test Driver \n", - "4 Test Driver \n", - "\n", - " DISENGAGEMENT\\nLOCATION\\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility) \\\n", - "0 Freeway \n", - "1 Freeway \n", - "2 Freeway \n", - "3 Freeway \n", - "4 Freeway \n", - "\n", - " DESCRIPTION OF FACTS CAUSING DISENGAGEMENT \n", - "0 Lane change maneuver: risk of lane departure, ... \n", - "1 Lane change maneuver: risk of lane departure, ... \n", - "2 Lane change maneuver: risk of lane departure, ... \n", - "3 Lane change maneuver: risk of lane departure, ... \n", - "4 Lane change maneuver to the exit lane: risk of... " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "raw_data.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.3. Daten bereinigen" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "<class 'pandas.core.frame.DataFrame'>\n", - "RangeIndex: 8885 entries, 0 to 8884\n", - "Data columns (total 9 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 Manufacturer 8885 non-null object\n", - " 1 Permit Number 8885 non-null object\n", - " 2 DATE 8884 non-null object\n", - " 3 VIN NUMBER 8884 non-null object\n", - " 4 VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\n", - "(Yes or No) 8884 non-null object\n", - " 5 DRIVER PRESENT\n", - "(Yes or No) 8884 non-null object\n", - " 6 DISENGAGEMENT INITIATED BY\n", - "(AV System, Test Driver, Remote Operator, or Passenger) 8884 non-null object\n", - " 7 DISENGAGEMENT\n", - "LOCATION\n", - "(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility) 8884 non-null object\n", - " 8 DESCRIPTION OF FACTS CAUSING DISENGAGEMENT 8884 non-null object\n", - "dtypes: object(9)\n", - "memory usage: 624.9+ KB\n" - ] - } - ], - "source": [ - "raw_data.info()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Manufacturer</th>\n", - " <th>Permit Number</th>\n", - " <th>DATE</th>\n", - " <th>VIN NUMBER</th>\n", - " <th>VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\\n(Yes or No)</th>\n", - " <th>DRIVER PRESENT\\n(Yes or No)</th>\n", - " <th>DISENGAGEMENT INITIATED BY\\n(AV System, Test Driver, Remote Operator, or Passenger)</th>\n", - " <th>DISENGAGEMENT\\nLOCATION\\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility)</th>\n", - " <th>DESCRIPTION OF FACTS CAUSING DISENGAGEMENT</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>count</th>\n", - " <td>8885</td>\n", - " <td>8885</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " </tr>\n", - " <tr>\n", - " <th>unique</th>\n", - " <td>28</td>\n", - " <td>27</td>\n", - " <td>3711</td>\n", - " <td>289</td>\n", - " <td>5</td>\n", - " <td>4</td>\n", - " <td>4</td>\n", - " <td>11</td>\n", - " <td>469</td>\n", - " </tr>\n", - " <tr>\n", - " <th>top</th>\n", - " <td>Toyota Research Institute</td>\n", - " <td>AVT050</td>\n", - " <td>3/28/2019</td>\n", - " <td>JTHDU1EF3G5020098</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Street</td>\n", - " <td>Safety Driver proactive disengagement.</td>\n", - " </tr>\n", - " <tr>\n", - " <th>freq</th>\n", - " <td>2947</td>\n", - " <td>2947</td>\n", - " <td>59</td>\n", - " <td>900</td>\n", - " <td>4369</td>\n", - " <td>4934</td>\n", - " <td>6037</td>\n", - " <td>4668</td>\n", - " <td>1780</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Manufacturer Permit Number DATE VIN NUMBER \\\n", - "count 8885 8885 8884 8884 \n", - "unique 28 27 3711 289 \n", - "top Toyota Research Institute AVT050 3/28/2019 JTHDU1EF3G5020098 \n", - "freq 2947 2947 59 900 \n", - "\n", - " VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\\n(Yes or No) \\\n", - "count 8884 \n", - "unique 5 \n", - "top No \n", - "freq 4369 \n", - "\n", - " DRIVER PRESENT\\n(Yes or No) \\\n", - "count 8884 \n", - "unique 4 \n", - "top Yes \n", - "freq 4934 \n", - "\n", - " DISENGAGEMENT INITIATED BY\\n(AV System, Test Driver, Remote Operator, or Passenger) \\\n", - "count 8884 \n", - "unique 4 \n", - "top Test Driver \n", - "freq 6037 \n", - "\n", - " DISENGAGEMENT\\nLOCATION\\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility) \\\n", - "count 8884 \n", - "unique 11 \n", - "top Street \n", - "freq 4668 \n", - "\n", - " DESCRIPTION OF FACTS CAUSING DISENGAGEMENT \n", - "count 8884 \n", - "unique 469 \n", - "top Safety Driver proactive disengagement. \n", - "freq 1780 " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "raw_data.describe(include=\"all\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "raw_data.rename(columns = {'VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\\n(Yes or No)':'OPERATING WITHOUT DRIVER', 'DRIVER PRESENT\\n(Yes or No)':'DRIVER PRESENT', 'DISENGAGEMENT INITIATED BY\\n(AV System, Test Driver, Remote Operator, or Passenger)':'DISENGAGEMENT INITIATED BY', 'DISENGAGEMENT\\nLOCATION\\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility)':'DISENGAGEMENT LOCATION', 'DESCRIPTION OF FACTS CAUSING DISENGAGEMENT':'FACTS CAUSING DISENGAGEMENT'}, inplace = True) " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Fehlende Werte" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Manufacturer 0\n", - "Permit Number 0\n", - "DATE 1\n", - "VIN NUMBER 1\n", - "OPERATING WITHOUT DRIVER 1\n", - "DRIVER PRESENT 1\n", - "DISENGAGEMENT INITIATED BY 1\n", - "DISENGAGEMENT LOCATION 1\n", - "FACTS CAUSING DISENGAGEMENT 1\n", - "dtype: int64" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "raw_data.isnull().sum()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Manufacturer 0\n", - "Permit Number 0\n", - "DATE 0\n", - "VIN NUMBER 0\n", - "OPERATING WITHOUT DRIVER 0\n", - "DRIVER PRESENT 0\n", - "DISENGAGEMENT INITIATED BY 0\n", - "DISENGAGEMENT LOCATION 0\n", - "FACTS CAUSING DISENGAGEMENT 0\n", - "dtype: int64" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_no_mv=raw_data.dropna(axis=0)\n", - "data_no_mv.isnull().sum()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Manufacturer</th>\n", - " <th>Permit Number</th>\n", - " <th>DATE</th>\n", - " <th>VIN NUMBER</th>\n", - " <th>OPERATING WITHOUT DRIVER</th>\n", - " <th>DRIVER PRESENT</th>\n", - " <th>DISENGAGEMENT INITIATED BY</th>\n", - " <th>DISENGAGEMENT LOCATION</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>count</th>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " <td>8884</td>\n", - " </tr>\n", - " <tr>\n", - " <th>unique</th>\n", - " <td>27</td>\n", - " <td>26</td>\n", - " <td>3711</td>\n", - " <td>289</td>\n", - " <td>5</td>\n", - " <td>4</td>\n", - " <td>4</td>\n", - " <td>11</td>\n", - " <td>469</td>\n", - " </tr>\n", - " <tr>\n", - " <th>top</th>\n", - " <td>Toyota Research Institute</td>\n", - " <td>AVT050</td>\n", - " <td>3/28/2019</td>\n", - " <td>JTHDU1EF3G5020098</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Street</td>\n", - " <td>Safety Driver proactive disengagement.</td>\n", - " </tr>\n", - " <tr>\n", - " <th>freq</th>\n", - " <td>2947</td>\n", - " <td>2947</td>\n", - " <td>59</td>\n", - " <td>900</td>\n", - " <td>4369</td>\n", - " <td>4934</td>\n", - " <td>6037</td>\n", - " <td>4668</td>\n", - " <td>1780</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Manufacturer Permit Number DATE VIN NUMBER \\\n", - "count 8884 8884 8884 8884 \n", - "unique 27 26 3711 289 \n", - "top Toyota Research Institute AVT050 3/28/2019 JTHDU1EF3G5020098 \n", - "freq 2947 2947 59 900 \n", - "\n", - " OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY \\\n", - "count 8884 8884 8884 \n", - "unique 5 4 4 \n", - "top No Yes Test Driver \n", - "freq 4369 4934 6037 \n", - "\n", - " DISENGAGEMENT LOCATION FACTS CAUSING DISENGAGEMENT \n", - "count 8884 8884 \n", - "unique 11 469 \n", - "top Street Safety Driver proactive disengagement. \n", - "freq 4668 1780 " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_no_mv.describe(include='all')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Datenaufbereitung und Kennzeichnung" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Test Driver 6037\n", - "AV System 2698\n", - "Vehicle Operator 81\n", - "Test driver 68\n", - "Name: DISENGAGEMENT INITIATED BY, dtype: int64" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_no_mv['DISENGAGEMENT INITIATED BY'].value_counts()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "469" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_no_mv['FACTS CAUSING DISENGAGEMENT'].nunique()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Manufacturer</th>\n", - " <th>Permit Number</th>\n", - " <th>DATE</th>\n", - " <th>VIN NUMBER</th>\n", - " <th>OPERATING WITHOUT DRIVER</th>\n", - " <th>DRIVER PRESENT</th>\n", - " <th>DISENGAGEMENT INITIATED BY</th>\n", - " <th>DISENGAGEMENT LOCATION</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.06.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.10.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.10.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>04.23.2019</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>05.14.2019</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver to the exit lane: risk of...</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Manufacturer Permit Number DATE VIN NUMBER \\\n", - "0 AImotive Inc. AVT003 12.06.2018 JTDKN3DU5A1092792 \n", - "1 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 \n", - "2 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 \n", - "3 AImotive Inc. AVT003 04.23.2019 JTDKN3DU5A1092792 \n", - "4 AImotive Inc. AVT003 05.14.2019 JTDKN3DU5A1092792 \n", - "\n", - " OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY \\\n", - "0 No Yes Test Driver \n", - "1 No Yes Test Driver \n", - "2 No Yes Test Driver \n", - "3 No Yes Test Driver \n", - "4 No Yes Test Driver \n", - "\n", - " DISENGAGEMENT LOCATION FACTS CAUSING DISENGAGEMENT \n", - "0 Freeway Lane change maneuver: risk of lane departure, ... \n", - "1 Freeway Lane change maneuver: risk of lane departure, ... \n", - "2 Freeway Lane change maneuver: risk of lane departure, ... \n", - "3 Freeway Lane change maneuver: risk of lane departure, ... \n", - "4 Freeway Lane change maneuver to the exit lane: risk of... " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_no_mv.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Safety Driver proactive disengagement. 1780\n", - "Disengage for unwanted maneuver of the vehicle caused by a planning discrepancy while generating an appropriate trajectory 805\n", - "Automatic disengagement caused by planner fault. 742\n", - "Disengage due to operator discomfort 636\n", - "Disengage for a software fault due to a potential performance issue with a software component of the self-driving system (including third party software components) 482\n", - " ... \n", - "Planning Logic: planner inadequately yields for cross traffic agent with right-of-way 1\n", - "Planning Logic: planned trajectory fails to avoid vehicle stopped ahead intersection 1\n", - "Object Perception: inaccurate perception of animal slowly crossing road leads to planned trajectory overlap 1\n", - "Planning Logic: incorrect behavior prediction for oncoming vehicle results in a planned trajectory that overlaps with the vehicle 1\n", - "Planning discrepancy; system planned incorrect trajectory to avoid oncoming traffic 1\n", - "Name: FACTS CAUSING DISENGAGEMENT, Length: 469, dtype: int64" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_no_mv['FACTS CAUSING DISENGAGEMENT'].value_counts()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Test Driver 6037\n", - "AV System 2698\n", - "Vehicle Operator 81\n", - "Test driver 68\n", - "Name: DISENGAGEMENT INITIATED BY, dtype: int64" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_no_mv['DISENGAGEMENT INITIATED BY'].value_counts()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data_reduced = data_no_mv.loc[0:1999, :]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "<class 'pandas.core.frame.DataFrame'>\n", - "Int64Index: 2000 entries, 0 to 1999\n", - "Data columns (total 9 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 Manufacturer 2000 non-null object\n", - " 1 Permit Number 2000 non-null object\n", - " 2 DATE 2000 non-null object\n", - " 3 VIN NUMBER 2000 non-null object\n", - " 4 OPERATING WITHOUT DRIVER 2000 non-null object\n", - " 5 DRIVER PRESENT 2000 non-null object\n", - " 6 DISENGAGEMENT INITIATED BY 2000 non-null object\n", - " 7 DISENGAGEMENT LOCATION 2000 non-null object\n", - " 8 FACTS CAUSING DISENGAGEMENT 2000 non-null object\n", - "dtypes: object(9)\n", - "memory usage: 156.2+ KB\n" - ] - } - ], - "source": [ - "data_reduced.info()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "92" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_reduced['FACTS CAUSING DISENGAGEMENT'].nunique()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/pandas/util/_decorators.py:311: SettingWithCopyWarning: \n", - "A value is trying to be set on a copy of a slice from a DataFrame\n", - "\n", - "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", - " return func(*args, **kwargs)\n" - ] - } - ], - "source": [ - "data_reduced.drop_duplicates(subset =\"FACTS CAUSING DISENGAGEMENT\", \n", - " keep = 'first', inplace = True) " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data_reduced = data_reduced.reset_index()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data_reduced['FACTS CAUSING DISENGAGEMENT'] = data_reduced['FACTS CAUSING DISENGAGEMENT'].astype('str')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "dtype('O')" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_reduced['FACTS CAUSING DISENGAGEMENT'].dtype" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import re" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "software_keywords = [\"software performance\", \"software fault\", \"software discrepancy\"\n", - " \"trajectory planning\", \"planning discrepancy\", \"planning error\",\n", - " \"wrong lane change suggestion\", \"wrong lane association\", \"data recording\",\n", - " \"improper lane-change plan\", \"undesirable manuever\", \"undesirable yielding maneuver\",\n", - " \"outside of rate requirements\", \"merged poorly\", \"mapping issue\", \"software issue\",\n", - " \"poor trajectory across lanes\", \"incorrect assessment\", \"incorrect behavior\",\n", - " \"unprotected\", \"Poor lane change\", \"very wide\", \"wrong object prediction\", \"undesired motion\",\n", - " \"unwanted maneuver\", \"perception discrepancy\", \"ghost object prediction\",\n", - " \"driving faster than driver expected\", \"expected path\",\n", - " \"not initialized correctly\", \"software module\", \"perception mismatch\", \"estimation\",\n", - " \"planner fault\", \"unstable\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py:1684: SettingWithCopyWarning: \n", - "A value is trying to be set on a copy of a slice from a DataFrame.\n", - "Try using .loc[row_indexer,col_indexer] = value instead\n", - "\n", - "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", - " self.obj[key] = infer_fill_value(value)\n", - "/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py:1817: SettingWithCopyWarning: \n", - "A value is trying to be set on a copy of a slice from a DataFrame.\n", - "Try using .loc[row_indexer,col_indexer] = value instead\n", - "\n", - "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", - " self._setitem_single_column(loc, value, pi)\n" - ] - } - ], - "source": [ - "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(software_keywords), na=False, case=False)\n", - " , 'Problem class'] = \"Software\" " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "hardware_keywords = [\"hardware performance\", \"hardware diagnostics\", \"controls diagnostics\", \"actor\", \n", - " \"yield to other actors\", \"Hardware irregularity\", \"weather conditions\", \"Autobox\", \n", - " \"performance issue with a hardware component\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(hardware_keywords), na=False, case=False)\n", - " , 'Problem class'] = \"Hardware\" " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "soft_hardware_keywords = [\"timed out\", \"timeout\", \"too long\", \"incorrect detection\", \"Lost track\", \"Localization\", \n", - " \"geo-location related\", \"unsuccessful right turn\", \"unsuccessful left turn\", \"system\",\n", - " \"traffic conditions\", \"failed to detect an object correctly\", \" took longer than expected\",\n", - " \"main computer froze\", \"not braking correctly\", \"not speeding up correctly\",\n", - " \"not turning enough\", \"not slowing down enough\", \"didn't detect\", \"Sensor Fusion discrepancy\",\n", - " \"did not meet expectation\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(soft_hardware_keywords), na=False, case=False)\n", - " , 'Problem class'] = \"Software/Hardware\" " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "Traffic_ctrl_keywords = [\"unstable target lane\", \"Traffic light error\", \"Stop sign error\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(Traffic_ctrl_keywords), na=False, case=False)\n", - " , 'Problem class'] = \"Traffic control objects\" " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "road_user_keywords = [\"reckless driver\", \"behaving road user\", \"other road user\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(road_user_keywords), na=False, case=False)\n", - " , 'Problem class'] = \"Other road user\" " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "external_influences = [\"obstruction\", \"encroachment\", \"occluded view\", \"surface conditions\", \"wheater\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(external_influences), na=False, case=False)\n", - " , 'Problem class'] = \"External influences\" " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data_no_mv.loc[data_no_mv['Problem class'].isnull()\n", - " , 'Problem class'] = \"Other problems\" " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Manufacturer</th>\n", - " <th>Permit Number</th>\n", - " <th>DATE</th>\n", - " <th>VIN NUMBER</th>\n", - " <th>OPERATING WITHOUT DRIVER</th>\n", - " <th>DRIVER PRESENT</th>\n", - " <th>DISENGAGEMENT INITIATED BY</th>\n", - " <th>DISENGAGEMENT LOCATION</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT</th>\n", - " <th>Problem class</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.06.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " <td>Traffic control objects</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.10.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " <td>Traffic control objects</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.10.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " <td>Traffic control objects</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>04.23.2019</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " <td>Other problems</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>05.14.2019</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver to the exit lane: risk of...</td>\n", - " <td>Software</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Manufacturer Permit Number DATE VIN NUMBER \\\n", - "0 AImotive Inc. AVT003 12.06.2018 JTDKN3DU5A1092792 \n", - "1 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 \n", - "2 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 \n", - "3 AImotive Inc. AVT003 04.23.2019 JTDKN3DU5A1092792 \n", - "4 AImotive Inc. AVT003 05.14.2019 JTDKN3DU5A1092792 \n", - "\n", - " OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY \\\n", - "0 No Yes Test Driver \n", - "1 No Yes Test Driver \n", - "2 No Yes Test Driver \n", - "3 No Yes Test Driver \n", - "4 No Yes Test Driver \n", - "\n", - " DISENGAGEMENT LOCATION FACTS CAUSING DISENGAGEMENT \\\n", - "0 Freeway Lane change maneuver: risk of lane departure, ... \n", - "1 Freeway Lane change maneuver: risk of lane departure, ... \n", - "2 Freeway Lane change maneuver: risk of lane departure, ... \n", - "3 Freeway Lane change maneuver: risk of lane departure, ... \n", - "4 Freeway Lane change maneuver to the exit lane: risk of... \n", - "\n", - " Problem class \n", - "0 Traffic control objects \n", - "1 Traffic control objects \n", - "2 Traffic control objects \n", - "3 Other problems \n", - "4 Software " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_no_mv.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "ename": "NameError", - "evalue": "name 'data_reduced_classified' is not defined", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n", - "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)\n", - "\u001b[0;32m/var/folders/kh/bds3ggxd09gbnhkp6c414s9h0000gn/T/ipykernel_20061/3854068531.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n", - "\u001b[1;32m 2\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtitle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Problems\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[1;32m 3\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mxticks\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrotation\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m45\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m----> 4\u001b[0;31m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbar\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mCounter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata_reduced_classified\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Problem class'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mCounter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata_reduced_classified\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Problem class'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0m\n", - "\u001b[0;31mNameError\u001b[0m: name 'data_reduced_classified' is not defined" - ] - }, - { - "data": { - "image/png": "", - "text/plain": [ - "<Figure size 432x288 with 1 Axes>" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "from collections import Counter\n", - "plt.title(\"Problems\")\n", - "plt.xticks(rotation = 45)\n", - "plt.bar(dict(Counter(data_reduced_classified['Problem class'])).keys(), dict(Counter(data_reduced_classified['Problem class'])).values())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Klassifikationsmodell" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data_preprocessed = data_no_mv.copy()\n", - "data_preprocessed['Problem class']=data_preprocessed['Problem class'].map(\n", - " {'Software':0,'Hardware':1,'Software/Hardware':2,\n", - " 'Traffic control objects':3,'Other road user':4, 'External influences':5,\n", - " 'Other problems':6})\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Manufacturer</th>\n", - " <th>Permit Number</th>\n", - " <th>DATE</th>\n", - " <th>VIN NUMBER</th>\n", - " <th>OPERATING WITHOUT DRIVER</th>\n", - " <th>DRIVER PRESENT</th>\n", - " <th>DISENGAGEMENT INITIATED BY</th>\n", - " <th>DISENGAGEMENT LOCATION</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT</th>\n", - " <th>Problem class</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.06.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " <td>3</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.10.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " <td>3</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>12.10.2018</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " <td>3</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>04.23.2019</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver: risk of lane departure, ...</td>\n", - " <td>6</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>AImotive Inc.</td>\n", - " <td>AVT003</td>\n", - " <td>05.14.2019</td>\n", - " <td>JTDKN3DU5A1092792</td>\n", - " <td>No</td>\n", - " <td>Yes</td>\n", - " <td>Test Driver</td>\n", - " <td>Freeway</td>\n", - " <td>Lane change maneuver to the exit lane: risk of...</td>\n", - " <td>0</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Manufacturer Permit Number DATE VIN NUMBER \\\n", - "0 AImotive Inc. AVT003 12.06.2018 JTDKN3DU5A1092792 \n", - "1 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 \n", - "2 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 \n", - "3 AImotive Inc. AVT003 04.23.2019 JTDKN3DU5A1092792 \n", - "4 AImotive Inc. AVT003 05.14.2019 JTDKN3DU5A1092792 \n", - "\n", - " OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY \\\n", - "0 No Yes Test Driver \n", - "1 No Yes Test Driver \n", - "2 No Yes Test Driver \n", - "3 No Yes Test Driver \n", - "4 No Yes Test Driver \n", - "\n", - " DISENGAGEMENT LOCATION FACTS CAUSING DISENGAGEMENT \\\n", - "0 Freeway Lane change maneuver: risk of lane departure, ... \n", - "1 Freeway Lane change maneuver: risk of lane departure, ... \n", - "2 Freeway Lane change maneuver: risk of lane departure, ... \n", - "3 Freeway Lane change maneuver: risk of lane departure, ... \n", - "4 Freeway Lane change maneuver to the exit lane: risk of... \n", - "\n", - " Problem class \n", - "0 3 \n", - "1 3 \n", - "2 3 \n", - "3 6 \n", - "4 0 " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_preprocessed.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "<class 'pandas.core.frame.DataFrame'>\n", - "Int64Index: 8884 entries, 0 to 8884\n", - "Data columns (total 10 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 Manufacturer 8884 non-null object\n", - " 1 Permit Number 8884 non-null object\n", - " 2 DATE 8884 non-null object\n", - " 3 VIN NUMBER 8884 non-null object\n", - " 4 OPERATING WITHOUT DRIVER 8884 non-null object\n", - " 5 DRIVER PRESENT 8884 non-null object\n", - " 6 DISENGAGEMENT INITIATED BY 8884 non-null object\n", - " 7 DISENGAGEMENT LOCATION 8884 non-null object\n", - " 8 FACTS CAUSING DISENGAGEMENT 8884 non-null object\n", - " 9 Problem class 8884 non-null int64 \n", - "dtypes: int64(1), object(9)\n", - "memory usage: 1021.5+ KB\n" - ] - } - ], - "source": [ - "data_preprocessed.info()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Problem class</th>\n", - " <th>Manufacturer_Apple Inc.</th>\n", - " <th>Manufacturer_Aurora Innovation, Inc.</th>\n", - " <th>Manufacturer_AutoX Technologies, Inc.</th>\n", - " <th>Manufacturer_BMW of North America</th>\n", - " <th>Manufacturer_Baidu USA LLC</th>\n", - " <th>Manufacturer_CRUISE LLC</th>\n", - " <th>Manufacturer_Drive.ai Inc</th>\n", - " <th>Manufacturer_Lyft</th>\n", - " <th>Manufacturer_Mercedes-Benz Research & Development North America, Inc.</th>\n", - " <th>...</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, \\nother road user behaving poorly</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, AV made unsuccessful left turn</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, third party lane encroachment</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, third party lane obstruction</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV lane change issues</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV made unsuccessful left turn</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, other road user behaving poorly</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane encroachment</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane obstruction</th>\n", - " <th>FACTS CAUSING DISENGAGEMENT_prediction discrepancy, a vehicle in the front was backing up, ego was not able to predict this behavior correctly.</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>3</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>...</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>3</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>...</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>3</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>...</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>6</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>...</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>...</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " <td>0</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>5 rows × 4538 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Problem class Manufacturer_Apple Inc. \\\n", - "0 3 0 \n", - "1 3 0 \n", - "2 3 0 \n", - "3 6 0 \n", - "4 0 0 \n", - "\n", - " Manufacturer_Aurora Innovation, Inc. \\\n", - "0 0 \n", - "1 0 \n", - "2 0 \n", - "3 0 \n", - "4 0 \n", - "\n", - " Manufacturer_AutoX Technologies, Inc. Manufacturer_BMW of North America \\\n", - "0 0 0 \n", - "1 0 0 \n", - "2 0 0 \n", - "3 0 0 \n", - "4 0 0 \n", - "\n", - " Manufacturer_Baidu USA LLC Manufacturer_CRUISE LLC \\\n", - "0 0 0 \n", - "1 0 0 \n", - "2 0 0 \n", - "3 0 0 \n", - "4 0 0 \n", - "\n", - " Manufacturer_Drive.ai Inc Manufacturer_Lyft \\\n", - "0 0 0 \n", - "1 0 0 \n", - "2 0 0 \n", - "3 0 0 \n", - "4 0 0 \n", - "\n", - " Manufacturer_Mercedes-Benz Research & Development North America, Inc. ... \\\n", - "0 0 ... \n", - "1 0 ... \n", - "2 0 ... \n", - "3 0 ... \n", - "4 0 ... \n", - "\n", - " FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, \\nother road user behaving poorly \\\n", - "0 0 \n", - "1 0 \n", - "2 0 \n", - "3 0 \n", - "4 0 \n", - "\n", - " FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, AV made unsuccessful left turn \\\n", - "0 0 \n", - "1 0 \n", - "2 0 \n", - "3 0 \n", - "4 0 \n", - "\n", - " FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, third party lane encroachment \\\n", - "0 0 \n", - "1 0 \n", - "2 0 \n", - "3 0 \n", - "4 0 \n", - "\n", - " FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, third party lane obstruction \\\n", - "0 0 \n", - "1 0 \n", - "2 0 \n", - "3 0 \n", - "4 0 \n", - "\n", - " FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV lane change issues \\\n", - "0 0 \n", - "1 0 \n", - "2 0 \n", - "3 0 \n", - "4 0 \n", - "\n", - " FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV made unsuccessful left turn \\\n", - "0 0 \n", - "1 0 \n", - "2 0 \n", - "3 0 \n", - "4 0 \n", - "\n", - " FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, other road user behaving poorly \\\n", - "0 0 \n", - "1 0 \n", - "2 0 \n", - "3 0 \n", - "4 0 \n", - "\n", - " FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane encroachment \\\n", - "0 0 \n", - "1 0 \n", - "2 0 \n", - "3 0 \n", - "4 0 \n", - "\n", - " FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane obstruction \\\n", - "0 0 \n", - "1 0 \n", - "2 0 \n", - "3 0 \n", - "4 0 \n", - "\n", - " FACTS CAUSING DISENGAGEMENT_prediction discrepancy, a vehicle in the front was backing up, ego was not able to predict this behavior correctly. \n", - "0 0 \n", - "1 0 \n", - "2 0 \n", - "3 0 \n", - "4 0 \n", - "\n", - "[5 rows x 4538 columns]" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_with_dummies = pd.get_dummies(data_preprocessed, drop_first=True)\n", - "data_with_dummies.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Problem class 0\n", - "Manufacturer_Apple Inc. 0\n", - "Manufacturer_Aurora Innovation, Inc. 0\n", - "Manufacturer_AutoX Technologies, Inc. 0\n", - "Manufacturer_BMW of North America 0\n", - " ..\n", - "FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV made unsuccessful left turn 0\n", - "FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, other road user behaving poorly 0\n", - "FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane encroachment 0\n", - "FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane obstruction 0\n", - "FACTS CAUSING DISENGAGEMENT_prediction discrepancy, a vehicle in the front was backing up, ego was not able to predict this behavior correctly. 0\n", - "Length: 4538, dtype: int64" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "data_with_dummies.isnull().sum()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.model_selection import train_test_split\n", - "from sklearn.metrics import confusion_matrix, accuracy_score, classification_report" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "target = data_with_dummies['Problem class']\n", - "inputs = data_with_dummies.drop(['Problem class'],axis=1)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "x_train, x_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2, random_state=365)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n", - " warnings.warn(msg, FutureWarning)\n" - ] - }, - { - "data": { - "text/plain": [ - "<AxesSubplot:xlabel='Problem class', ylabel='Density'>" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "<Figure size 432x288 with 1 Axes>" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "sns.distplot((y_test),bins=50)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### K-Nearest Neighbors" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.neighbors import KNeighborsClassifier" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Text(0, 0.5, 'Error Rate')" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "<Figure size 720x432 with 1 Axes>" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "error_rate = []\n", - " \n", - "for i in range(1,10):\n", - " \n", - " KNN_model = KNeighborsClassifier(n_neighbors=i)\n", - " KNN_model.fit(x_train,y_train)\n", - " KNN_prediction = KNN_model.predict(x_test)\n", - " error_rate.append(np.mean(KNN_prediction != y_test)) #nur Fehler berücksichtigen\n", - "\n", - "plt.figure(figsize=(10,6))\n", - "plt.plot(range(1,10),error_rate,color='blue', linestyle='dashed', marker='o',\n", - " markerfacecolor='red', markersize=10)\n", - "plt.title('Error Rate vs. K Values')\n", - "plt.xlabel('K')\n", - "plt.ylabel('Error Rate')\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "KNN_model = KNeighborsClassifier(n_neighbors=3)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "KNeighborsClassifier(n_neighbors=3)" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "KNN_model.fit(x_train, y_train)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "KNN_prediction = KNN_model.predict(x_test)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.9712999437253799" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "accuracy_score(KNN_prediction, y_test)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " precision recall f1-score support\n", - "\n", - " 0 0.98 0.97 0.97 538\n", - " 1 0.82 0.82 0.82 17\n", - " 2 0.97 0.97 0.97 530\n", - " 3 0.00 0.00 0.00 0\n", - " 4 0.83 0.74 0.78 27\n", - " 5 0.70 1.00 0.82 7\n", - " 6 0.98 0.99 0.98 658\n", - "\n", - " accuracy 0.97 1777\n", - " macro avg 0.76 0.78 0.77 1777\n", - "weighted avg 0.97 0.97 0.97 1777\n", - "\n" - ] + "paragraph": "DataUnderstanding", + "slideshow": { + "slide_type": "" }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1308: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", - " _warn_prf(average, modifier, msg_start, len(result))\n", - "/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1308: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", - " _warn_prf(average, modifier, msg_start, len(result))\n", - "/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1308: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", - " _warn_prf(average, modifier, msg_start, len(result))\n" - ] - } - ], - "source": [ - "print(classification_report(KNN_prediction, y_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Entscheidungsbaum" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.tree import DecisionTreeClassifier" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dtree = DecisionTreeClassifier()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "DecisionTreeClassifier()" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "dtree.fit(x_train,y_train)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "predictions = dtree.predict(x_test)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " precision recall f1-score support\n", - "\n", - " 0 0.99 0.99 0.99 531\n", - " 1 0.94 1.00 0.97 17\n", - " 2 0.99 0.98 0.99 527\n", - " 3 1.00 1.00 1.00 1\n", - " 4 0.79 0.96 0.87 24\n", - " 5 1.00 0.80 0.89 10\n", - " 6 0.99 0.99 0.99 667\n", - "\n", - " accuracy 0.99 1777\n", - " macro avg 0.96 0.96 0.96 1777\n", - "weighted avg 0.99 0.99 0.99 1777\n", - "\n" - ] - } - ], - "source": [ - "print(classification_report(y_test,predictions))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": { - "include": true, - "paragraph": "DataUnderstanding" + "tags": [] }, "source": [ "# 2. Daten und Datenverständnis\n", @@ -6025,10 +3839,11 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.9" + "version": "3.12.3" }, + "skipNotebookInDeployment": false, "title": "Customer Churn Prediction", - "teaser": "In diesem Praxisbeispiel können Sie mithilfe eines Machine-Learning-Modells präzise vorhersagen, ob ein Kunde mit dem Service eines Luftfahrtunternehmens zufrieden sein wird, indem Sie Kundenzufriedenheitsdaten analysieren." + "teaser": "In diesem Praxisbeispiel können Sie mithilfe des Machine Learning Models herausfinden, wie Sie die Kundenabwanderung in der Telekommunikationsbranche präzise vorhersagen und proaktiv darauf reagieren können. " }, "nbformat": 4, "nbformat_minor": 4