diff --git a/Automotive/Improvement of components for autonomous motor vehicles/notebook.ipynb b/Automotive/Improvement of components for autonomous motor vehicles/notebook.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..922c709389dca2028c2e10d514c694b6cd4b9e20
--- /dev/null
+++ b/Automotive/Improvement of components for autonomous motor vehicles/notebook.ipynb	
@@ -0,0 +1,2023 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Business Understanding¶\n",
+    "This case study is intended to assist technology companies in developing and improving components for autonomous vehicles.\n",
+    "In doing so, this case study poses the question:\n",
+    "In which areas are the weaknesses of the tested systems and how can these be classified?\n",
+    "The question will be answered by classifying the reasons for the human test driver's intervention in the system. In doing so, the reasons are to be classified into different categories:\n",
+    "- Software problems\n",
+    "- hardware problems\n",
+    "- Software and hardware problems (problems that occur in combination)\n",
+    "- Problems caused by objects of traffic control (e.g. malfunctions of traffic light systems)\n",
+    "- problems caused by other road users\n",
+    "- external influences (including e.g. blockages, hidden elements, weather and road conditions)\n",
+    "- other problems"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Data and Data Understanding \n",
+    "The underlying dataset was obtained from Kaggle and shows necessary interventions by a driver in autonomous driving vehicle technology.\n",
+    "It was recorded from December 1, 2018 to November 30, 2019 and is licensed under U.S. Government Works.\n",
+    "The data processing is done with the help of a Jupyter notebook, which is included in the most popular data science platform \"Anaconda\".\n",
+    "The data set is in 8885 x 9 format.\n",
+    "The following data was recorded:\n",
+    "- Manufacturer\n",
+    "- Approval number\n",
+    "- date\n",
+    "- Vehicle identification number\n",
+    "- Possibility of operation without driver\n",
+    "- Is there a driver on site?\n",
+    "- Autonomous driving interrupted by AV system, test driver, remote control, or passenger\n",
+    "- Location of incident: interstate, highway, expressway, rural road, street or parking lot\n",
+    "- Description of causes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.1. Import of Relevant Modules"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 76,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import statsmodels.api as sm\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sns\n",
+    "sns.set()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 77,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'2.0.3'"
+      ]
+     },
+     "execution_count": 77,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pd.__version__"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.2 Read Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 78,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "raw_data = pd.read_csv('https://storage.googleapis.com/ml-service-repository-datastorage/Improvement_of_components_for_autonomous_motor_vehicles_data.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 79,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Manufacturer</th>\n",
+       "      <th>Permit Number</th>\n",
+       "      <th>DATE</th>\n",
+       "      <th>VIN NUMBER</th>\n",
+       "      <th>VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\\n(Yes or No)</th>\n",
+       "      <th>DRIVER PRESENT\\n(Yes or No)</th>\n",
+       "      <th>DISENGAGEMENT INITIATED BY\\n(AV System, Test Driver, Remote Operator, or Passenger)</th>\n",
+       "      <th>DISENGAGEMENT\\nLOCATION\\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility)</th>\n",
+       "      <th>DESCRIPTION OF FACTS CAUSING DISENGAGEMENT</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.06.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.10.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.10.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>04.23.2019</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>05.14.2019</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver to the exit lane: risk of...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "    Manufacturer Permit Number        DATE         VIN NUMBER  \\\n",
+       "0  AImotive Inc.        AVT003  12.06.2018  JTDKN3DU5A1092792   \n",
+       "1  AImotive Inc.        AVT003  12.10.2018  JTDKN3DU5A1092792   \n",
+       "2  AImotive Inc.        AVT003  12.10.2018  JTDKN3DU5A1092792   \n",
+       "3  AImotive Inc.        AVT003  04.23.2019  JTDKN3DU5A1092792   \n",
+       "4  AImotive Inc.        AVT003  05.14.2019  JTDKN3DU5A1092792   \n",
+       "\n",
+       "  VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\\n(Yes or No)  \\\n",
+       "0                                                 No              \n",
+       "1                                                 No              \n",
+       "2                                                 No              \n",
+       "3                                                 No              \n",
+       "4                                                 No              \n",
+       "\n",
+       "  DRIVER PRESENT\\n(Yes or No)  \\\n",
+       "0                         Yes   \n",
+       "1                         Yes   \n",
+       "2                         Yes   \n",
+       "3                         Yes   \n",
+       "4                         Yes   \n",
+       "\n",
+       "  DISENGAGEMENT INITIATED BY\\n(AV System, Test Driver, Remote Operator, or Passenger)  \\\n",
+       "0                                        Test Driver                                    \n",
+       "1                                        Test Driver                                    \n",
+       "2                                        Test Driver                                    \n",
+       "3                                        Test Driver                                    \n",
+       "4                                        Test Driver                                    \n",
+       "\n",
+       "  DISENGAGEMENT\\nLOCATION\\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility)  \\\n",
+       "0                                            Freeway                                                 \n",
+       "1                                            Freeway                                                 \n",
+       "2                                            Freeway                                                 \n",
+       "3                                            Freeway                                                 \n",
+       "4                                            Freeway                                                 \n",
+       "\n",
+       "          DESCRIPTION OF FACTS CAUSING DISENGAGEMENT  \n",
+       "0  Lane change maneuver: risk of lane departure, ...  \n",
+       "1  Lane change maneuver: risk of lane departure, ...  \n",
+       "2  Lane change maneuver: risk of lane departure, ...  \n",
+       "3  Lane change maneuver: risk of lane departure, ...  \n",
+       "4  Lane change maneuver to the exit lane: risk of...  "
+      ]
+     },
+     "execution_count": 79,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "raw_data.to_csv('rawdata.csv', index=False)\n",
+    "raw_data.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.3. Data Cleaning"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 80,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "RangeIndex: 8885 entries, 0 to 8884\n",
+      "Data columns (total 9 columns):\n",
+      " #   Column                                                                                          Non-Null Count  Dtype \n",
+      "---  ------                                                                                          --------------  ----- \n",
+      " 0   Manufacturer                                                                                    8885 non-null   object\n",
+      " 1   Permit Number                                                                                   8885 non-null   object\n",
+      " 2   DATE                                                                                            8884 non-null   object\n",
+      " 3   VIN NUMBER                                                                                      8884 non-null   object\n",
+      " 4   VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\n",
+      "(Yes or No)                                    8884 non-null   object\n",
+      " 5   DRIVER PRESENT\n",
+      "(Yes or No)                                                                      8884 non-null   object\n",
+      " 6   DISENGAGEMENT INITIATED BY\n",
+      "(AV System, Test Driver, Remote Operator, or Passenger)              8884 non-null   object\n",
+      " 7   DISENGAGEMENT\n",
+      "LOCATION\n",
+      "(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility)  8884 non-null   object\n",
+      " 8   DESCRIPTION OF FACTS CAUSING DISENGAGEMENT                                                      8884 non-null   object\n",
+      "dtypes: object(9)\n",
+      "memory usage: 624.9+ KB\n"
+     ]
+    }
+   ],
+   "source": [
+    "raw_data.info()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 81,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Manufacturer</th>\n",
+       "      <th>Permit Number</th>\n",
+       "      <th>DATE</th>\n",
+       "      <th>VIN NUMBER</th>\n",
+       "      <th>VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\\n(Yes or No)</th>\n",
+       "      <th>DRIVER PRESENT\\n(Yes or No)</th>\n",
+       "      <th>DISENGAGEMENT INITIATED BY\\n(AV System, Test Driver, Remote Operator, or Passenger)</th>\n",
+       "      <th>DISENGAGEMENT\\nLOCATION\\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility)</th>\n",
+       "      <th>DESCRIPTION OF FACTS CAUSING DISENGAGEMENT</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>count</th>\n",
+       "      <td>8885</td>\n",
+       "      <td>8885</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>unique</th>\n",
+       "      <td>28</td>\n",
+       "      <td>27</td>\n",
+       "      <td>3711</td>\n",
+       "      <td>289</td>\n",
+       "      <td>5</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>11</td>\n",
+       "      <td>469</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>top</th>\n",
+       "      <td>Toyota Research Institute</td>\n",
+       "      <td>AVT050</td>\n",
+       "      <td>3/28/2019</td>\n",
+       "      <td>JTHDU1EF3G5020098</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Street</td>\n",
+       "      <td>Safety Driver proactive disengagement.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>freq</th>\n",
+       "      <td>2947</td>\n",
+       "      <td>2947</td>\n",
+       "      <td>59</td>\n",
+       "      <td>900</td>\n",
+       "      <td>4369</td>\n",
+       "      <td>4934</td>\n",
+       "      <td>6037</td>\n",
+       "      <td>4668</td>\n",
+       "      <td>1780</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                     Manufacturer Permit Number       DATE         VIN NUMBER  \\\n",
+       "count                        8885          8885       8884               8884   \n",
+       "unique                         28            27       3711                289   \n",
+       "top     Toyota Research Institute        AVT050  3/28/2019  JTHDU1EF3G5020098   \n",
+       "freq                         2947          2947         59                900   \n",
+       "\n",
+       "       VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\\n(Yes or No)  \\\n",
+       "count                                                8884              \n",
+       "unique                                                  5              \n",
+       "top                                                    No              \n",
+       "freq                                                 4369              \n",
+       "\n",
+       "       DRIVER PRESENT\\n(Yes or No)  \\\n",
+       "count                         8884   \n",
+       "unique                           4   \n",
+       "top                            Yes   \n",
+       "freq                          4934   \n",
+       "\n",
+       "       DISENGAGEMENT INITIATED BY\\n(AV System, Test Driver, Remote Operator, or Passenger)  \\\n",
+       "count                                                8884                                    \n",
+       "unique                                                  4                                    \n",
+       "top                                           Test Driver                                    \n",
+       "freq                                                 6037                                    \n",
+       "\n",
+       "       DISENGAGEMENT\\nLOCATION\\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility)  \\\n",
+       "count                                                8884                                                 \n",
+       "unique                                                 11                                                 \n",
+       "top                                                Street                                                 \n",
+       "freq                                                 4668                                                 \n",
+       "\n",
+       "       DESCRIPTION OF FACTS CAUSING DISENGAGEMENT  \n",
+       "count                                        8884  \n",
+       "unique                                        469  \n",
+       "top        Safety Driver proactive disengagement.  \n",
+       "freq                                         1780  "
+      ]
+     },
+     "execution_count": 81,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "raw_data.describe(include=\"all\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 82,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "raw_data.rename(columns = {'VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\\n(Yes or No)':'OPERATING WITHOUT DRIVER', 'DRIVER PRESENT\\n(Yes or No)':'DRIVER PRESENT', 'DISENGAGEMENT INITIATED BY\\n(AV System, Test Driver, Remote Operator, or Passenger)':'DISENGAGEMENT INITIATED BY', 'DISENGAGEMENT\\nLOCATION\\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility)':'DISENGAGEMENT LOCATION', 'DESCRIPTION OF FACTS CAUSING DISENGAGEMENT':'FACTS CAUSING DISENGAGEMENT'}, inplace = True) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Missing Values"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 83,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Manufacturer                   0\n",
+       "Permit Number                  0\n",
+       "DATE                           1\n",
+       "VIN NUMBER                     1\n",
+       "OPERATING WITHOUT DRIVER       1\n",
+       "DRIVER PRESENT                 1\n",
+       "DISENGAGEMENT INITIATED BY     1\n",
+       "DISENGAGEMENT LOCATION         1\n",
+       "FACTS CAUSING DISENGAGEMENT    1\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 83,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "raw_data.isnull().sum()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 84,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Manufacturer                   0\n",
+       "Permit Number                  0\n",
+       "DATE                           0\n",
+       "VIN NUMBER                     0\n",
+       "OPERATING WITHOUT DRIVER       0\n",
+       "DRIVER PRESENT                 0\n",
+       "DISENGAGEMENT INITIATED BY     0\n",
+       "DISENGAGEMENT LOCATION         0\n",
+       "FACTS CAUSING DISENGAGEMENT    0\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 84,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_no_mv=raw_data.dropna(axis=0)\n",
+    "data_no_mv.isnull().sum()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 85,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Manufacturer</th>\n",
+       "      <th>Permit Number</th>\n",
+       "      <th>DATE</th>\n",
+       "      <th>VIN NUMBER</th>\n",
+       "      <th>OPERATING WITHOUT DRIVER</th>\n",
+       "      <th>DRIVER PRESENT</th>\n",
+       "      <th>DISENGAGEMENT INITIATED BY</th>\n",
+       "      <th>DISENGAGEMENT LOCATION</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>count</th>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "      <td>8884</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>unique</th>\n",
+       "      <td>27</td>\n",
+       "      <td>26</td>\n",
+       "      <td>3711</td>\n",
+       "      <td>289</td>\n",
+       "      <td>5</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>11</td>\n",
+       "      <td>469</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>top</th>\n",
+       "      <td>Toyota Research Institute</td>\n",
+       "      <td>AVT050</td>\n",
+       "      <td>3/28/2019</td>\n",
+       "      <td>JTHDU1EF3G5020098</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Street</td>\n",
+       "      <td>Safety Driver proactive disengagement.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>freq</th>\n",
+       "      <td>2947</td>\n",
+       "      <td>2947</td>\n",
+       "      <td>59</td>\n",
+       "      <td>900</td>\n",
+       "      <td>4369</td>\n",
+       "      <td>4934</td>\n",
+       "      <td>6037</td>\n",
+       "      <td>4668</td>\n",
+       "      <td>1780</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                     Manufacturer Permit Number       DATE         VIN NUMBER  \\\n",
+       "count                        8884          8884       8884               8884   \n",
+       "unique                         27            26       3711                289   \n",
+       "top     Toyota Research Institute        AVT050  3/28/2019  JTHDU1EF3G5020098   \n",
+       "freq                         2947          2947         59                900   \n",
+       "\n",
+       "       OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY  \\\n",
+       "count                      8884           8884                       8884   \n",
+       "unique                        5              4                          4   \n",
+       "top                          No            Yes                Test Driver   \n",
+       "freq                       4369           4934                       6037   \n",
+       "\n",
+       "       DISENGAGEMENT LOCATION             FACTS CAUSING DISENGAGEMENT  \n",
+       "count                    8884                                    8884  \n",
+       "unique                     11                                     469  \n",
+       "top                    Street  Safety Driver proactive disengagement.  \n",
+       "freq                     4668                                    1780  "
+      ]
+     },
+     "execution_count": 85,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_no_mv.describe(include='all')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Data preparation and labeling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 86,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DISENGAGEMENT INITIATED BY\n",
+       "Test Driver         6037\n",
+       "AV System           2698\n",
+       "Vehicle Operator      81\n",
+       "Test driver           68\n",
+       "Name: count, dtype: int64"
+      ]
+     },
+     "execution_count": 86,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_no_mv['DISENGAGEMENT INITIATED BY'].value_counts()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 87,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "469"
+      ]
+     },
+     "execution_count": 87,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_no_mv['FACTS CAUSING DISENGAGEMENT'].nunique()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 88,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Manufacturer</th>\n",
+       "      <th>Permit Number</th>\n",
+       "      <th>DATE</th>\n",
+       "      <th>VIN NUMBER</th>\n",
+       "      <th>OPERATING WITHOUT DRIVER</th>\n",
+       "      <th>DRIVER PRESENT</th>\n",
+       "      <th>DISENGAGEMENT INITIATED BY</th>\n",
+       "      <th>DISENGAGEMENT LOCATION</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.06.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.10.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.10.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>04.23.2019</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>05.14.2019</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver to the exit lane: risk of...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "    Manufacturer Permit Number        DATE         VIN NUMBER  \\\n",
+       "0  AImotive Inc.        AVT003  12.06.2018  JTDKN3DU5A1092792   \n",
+       "1  AImotive Inc.        AVT003  12.10.2018  JTDKN3DU5A1092792   \n",
+       "2  AImotive Inc.        AVT003  12.10.2018  JTDKN3DU5A1092792   \n",
+       "3  AImotive Inc.        AVT003  04.23.2019  JTDKN3DU5A1092792   \n",
+       "4  AImotive Inc.        AVT003  05.14.2019  JTDKN3DU5A1092792   \n",
+       "\n",
+       "  OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY  \\\n",
+       "0                       No            Yes                Test Driver   \n",
+       "1                       No            Yes                Test Driver   \n",
+       "2                       No            Yes                Test Driver   \n",
+       "3                       No            Yes                Test Driver   \n",
+       "4                       No            Yes                Test Driver   \n",
+       "\n",
+       "  DISENGAGEMENT LOCATION                        FACTS CAUSING DISENGAGEMENT  \n",
+       "0                Freeway  Lane change maneuver: risk of lane departure, ...  \n",
+       "1                Freeway  Lane change maneuver: risk of lane departure, ...  \n",
+       "2                Freeway  Lane change maneuver: risk of lane departure, ...  \n",
+       "3                Freeway  Lane change maneuver: risk of lane departure, ...  \n",
+       "4                Freeway  Lane change maneuver to the exit lane: risk of...  "
+      ]
+     },
+     "execution_count": 88,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_no_mv.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 89,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "FACTS CAUSING DISENGAGEMENT\n",
+       "Safety Driver proactive disengagement.                                                                                                                                  1780\n",
+       "Disengage for unwanted maneuver of the vehicle caused by a planning discrepancy while generating an appropriate trajectory                                               805\n",
+       "Automatic disengagement caused by planner fault.                                                                                                                         742\n",
+       "Disengage due to operator discomfort                                                                                                                                     636\n",
+       "Disengage for a software fault due to a potential performance issue with a software component of the self-driving system (including third party software components)     482\n",
+       "                                                                                                                                                                        ... \n",
+       "Planning Logic: planner inadequately yields for cross traffic agent with right-of-way                                                                                      1\n",
+       "Planning Logic: planned trajectory fails to avoid vehicle stopped ahead intersection                                                                                       1\n",
+       "Object Perception: inaccurate perception of animal slowly crossing road leads to planned trajectory overlap                                                                1\n",
+       "Planning Logic: incorrect behavior prediction for oncoming vehicle results in a planned trajectory that overlaps with the vehicle                                          1\n",
+       "Planning discrepancy; system planned incorrect trajectory to avoid oncoming traffic                                                                                        1\n",
+       "Name: count, Length: 469, dtype: int64"
+      ]
+     },
+     "execution_count": 89,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_no_mv['FACTS CAUSING DISENGAGEMENT'].value_counts()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 90,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DISENGAGEMENT INITIATED BY\n",
+       "Test Driver         6037\n",
+       "AV System           2698\n",
+       "Vehicle Operator      81\n",
+       "Test driver           68\n",
+       "Name: count, dtype: int64"
+      ]
+     },
+     "execution_count": 90,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_no_mv['DISENGAGEMENT INITIATED BY'].value_counts()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 91,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_reduced = data_no_mv.loc[0:1999, :]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 92,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "Index: 2000 entries, 0 to 1999\n",
+      "Data columns (total 9 columns):\n",
+      " #   Column                       Non-Null Count  Dtype \n",
+      "---  ------                       --------------  ----- \n",
+      " 0   Manufacturer                 2000 non-null   object\n",
+      " 1   Permit Number                2000 non-null   object\n",
+      " 2   DATE                         2000 non-null   object\n",
+      " 3   VIN NUMBER                   2000 non-null   object\n",
+      " 4   OPERATING WITHOUT DRIVER     2000 non-null   object\n",
+      " 5   DRIVER PRESENT               2000 non-null   object\n",
+      " 6   DISENGAGEMENT INITIATED BY   2000 non-null   object\n",
+      " 7   DISENGAGEMENT LOCATION       2000 non-null   object\n",
+      " 8   FACTS CAUSING DISENGAGEMENT  2000 non-null   object\n",
+      "dtypes: object(9)\n",
+      "memory usage: 156.2+ KB\n"
+     ]
+    }
+   ],
+   "source": [
+    "data_reduced.info()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 93,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "92"
+      ]
+     },
+     "execution_count": 93,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_reduced['FACTS CAUSING DISENGAGEMENT'].nunique()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 94,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "C:\\Users\\ar\\AppData\\Local\\Temp\\ipykernel_3376\\3061903378.py:1: SettingWithCopyWarning: \n",
+      "A value is trying to be set on a copy of a slice from a DataFrame\n",
+      "\n",
+      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
+      "  data_reduced.drop_duplicates(subset =\"FACTS CAUSING DISENGAGEMENT\",\n"
+     ]
+    }
+   ],
+   "source": [
+    "data_reduced.drop_duplicates(subset =\"FACTS CAUSING DISENGAGEMENT\", \n",
+    "                     keep = 'first', inplace = True) "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 95,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_reduced = data_reduced.reset_index()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 96,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_reduced['FACTS CAUSING DISENGAGEMENT'] = data_reduced['FACTS CAUSING DISENGAGEMENT'].astype('str')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 97,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "dtype('O')"
+      ]
+     },
+     "execution_count": 97,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_reduced['FACTS CAUSING DISENGAGEMENT'].dtype"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 98,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import re"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 99,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "software_keywords = [\"software performance\", \"software fault\", \"software discrepancy\"\n",
+    "                     \"trajectory planning\", \"planning discrepancy\", \"planning error\",\n",
+    "                     \"wrong lane change suggestion\", \"wrong lane association\", \"data recording\",\n",
+    "                     \"improper lane-change plan\", \"undesirable manuever\", \"undesirable yielding maneuver\",\n",
+    "                     \"outside of rate requirements\", \"merged poorly\", \"mapping issue\", \"software issue\",\n",
+    "                    \"poor trajectory across lanes\", \"incorrect assessment\", \"incorrect behavior\",\n",
+    "                    \"unprotected\", \"Poor lane change\", \"very wide\", \"wrong object prediction\", \"undesired motion\",\n",
+    "                    \"unwanted maneuver\", \"perception discrepancy\", \"ghost object prediction\",\n",
+    "                    \"driving faster than driver expected\", \"expected path\",\n",
+    "                    \"not initialized correctly\", \"software module\", \"perception mismatch\", \"estimation\",\n",
+    "                    \"planner fault\", \"unstable\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 100,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "C:\\Users\\ar\\AppData\\Local\\Temp\\ipykernel_3376\\1320408372.py:1: SettingWithCopyWarning: \n",
+      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
+      "Try using .loc[row_indexer,col_indexer] = value instead\n",
+      "\n",
+      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
+      "  data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(software_keywords), na=False, case=False)\n"
+     ]
+    }
+   ],
+   "source": [
+    "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(software_keywords), na=False, case=False)\n",
+    "                  , 'Problem class'] = \"Software\" "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 101,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "hardware_keywords = [\"hardware performance\", \"hardware diagnostics\", \"controls diagnostics\", \"actor\", \n",
+    "                    \"yield to other actors\", \"Hardware irregularity\", \"weather conditions\", \"Autobox\", \n",
+    "                    \"performance issue with a hardware component\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 102,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(hardware_keywords), na=False, case=False)\n",
+    "                  , 'Problem class'] = \"Hardware\" "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 103,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "soft_hardware_keywords = [\"timed out\", \"timeout\", \"too long\", \"incorrect detection\", \"Lost track\", \"Localization\", \n",
+    "                         \"geo-location related\", \"unsuccessful right turn\", \"unsuccessful left turn\", \"system\",\n",
+    "                         \"traffic conditions\", \"failed to detect an object correctly\", \" took longer than expected\",\n",
+    "                         \"main computer froze\", \"not braking correctly\", \"not speeding up correctly\",\n",
+    "                         \"not turning enough\", \"not slowing down enough\", \"didn't detect\", \"Sensor Fusion discrepancy\",\n",
+    "                         \"did not meet expectation\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 104,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(soft_hardware_keywords), na=False, case=False)\n",
+    "                  , 'Problem class'] = \"Software/Hardware\" "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 105,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Traffic_ctrl_keywords = [\"unstable target lane\", \"Traffic light error\", \"Stop sign error\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 106,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(Traffic_ctrl_keywords), na=False, case=False)\n",
+    "                  , 'Problem class'] = \"Traffic control objects\" "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 107,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "road_user_keywords = [\"reckless driver\", \"behaving road user\", \"other road user\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 108,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(road_user_keywords), na=False, case=False)\n",
+    "                  , 'Problem class'] = \"Other road user\" "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 109,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "external_influences = [\"obstruction\", \"encroachment\", \"occluded view\", \"surface conditions\", \"wheater\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 110,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(external_influences), na=False, case=False)\n",
+    "                  , 'Problem class'] = \"External influences\" "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 111,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_no_mv.loc[data_no_mv['Problem class'].isnull()\n",
+    "                  , 'Problem class'] = \"Other problems\" "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 112,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Manufacturer</th>\n",
+       "      <th>Permit Number</th>\n",
+       "      <th>DATE</th>\n",
+       "      <th>VIN NUMBER</th>\n",
+       "      <th>OPERATING WITHOUT DRIVER</th>\n",
+       "      <th>DRIVER PRESENT</th>\n",
+       "      <th>DISENGAGEMENT INITIATED BY</th>\n",
+       "      <th>DISENGAGEMENT LOCATION</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT</th>\n",
+       "      <th>Problem class</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.06.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "      <td>Traffic control objects</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.10.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "      <td>Traffic control objects</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.10.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "      <td>Traffic control objects</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>04.23.2019</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "      <td>Other problems</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>05.14.2019</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver to the exit lane: risk of...</td>\n",
+       "      <td>Software</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "    Manufacturer Permit Number        DATE         VIN NUMBER  \\\n",
+       "0  AImotive Inc.        AVT003  12.06.2018  JTDKN3DU5A1092792   \n",
+       "1  AImotive Inc.        AVT003  12.10.2018  JTDKN3DU5A1092792   \n",
+       "2  AImotive Inc.        AVT003  12.10.2018  JTDKN3DU5A1092792   \n",
+       "3  AImotive Inc.        AVT003  04.23.2019  JTDKN3DU5A1092792   \n",
+       "4  AImotive Inc.        AVT003  05.14.2019  JTDKN3DU5A1092792   \n",
+       "\n",
+       "  OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY  \\\n",
+       "0                       No            Yes                Test Driver   \n",
+       "1                       No            Yes                Test Driver   \n",
+       "2                       No            Yes                Test Driver   \n",
+       "3                       No            Yes                Test Driver   \n",
+       "4                       No            Yes                Test Driver   \n",
+       "\n",
+       "  DISENGAGEMENT LOCATION                        FACTS CAUSING DISENGAGEMENT  \\\n",
+       "0                Freeway  Lane change maneuver: risk of lane departure, ...   \n",
+       "1                Freeway  Lane change maneuver: risk of lane departure, ...   \n",
+       "2                Freeway  Lane change maneuver: risk of lane departure, ...   \n",
+       "3                Freeway  Lane change maneuver: risk of lane departure, ...   \n",
+       "4                Freeway  Lane change maneuver to the exit lane: risk of...   \n",
+       "\n",
+       "             Problem class  \n",
+       "0  Traffic control objects  \n",
+       "1  Traffic control objects  \n",
+       "2  Traffic control objects  \n",
+       "3           Other problems  \n",
+       "4                 Software  "
+      ]
+     },
+     "execution_count": 112,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_no_mv.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Classification model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 113,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_preprocessed = data_no_mv.copy()\n",
+    "data_preprocessed['Problem class']=data_preprocessed['Problem class'].map(\n",
+    "    {'Software':0,'Hardware':1,'Software/Hardware':2,\n",
+    "    'Traffic control objects':3,'Other road user':4, 'External influences':5,\n",
+    "     'Other problems':6})\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 114,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Manufacturer</th>\n",
+       "      <th>Permit Number</th>\n",
+       "      <th>DATE</th>\n",
+       "      <th>VIN NUMBER</th>\n",
+       "      <th>OPERATING WITHOUT DRIVER</th>\n",
+       "      <th>DRIVER PRESENT</th>\n",
+       "      <th>DISENGAGEMENT INITIATED BY</th>\n",
+       "      <th>DISENGAGEMENT LOCATION</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT</th>\n",
+       "      <th>Problem class</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.06.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "      <td>3</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.10.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "      <td>3</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>12.10.2018</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "      <td>3</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>04.23.2019</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver: risk of lane departure, ...</td>\n",
+       "      <td>6</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>AImotive Inc.</td>\n",
+       "      <td>AVT003</td>\n",
+       "      <td>05.14.2019</td>\n",
+       "      <td>JTDKN3DU5A1092792</td>\n",
+       "      <td>No</td>\n",
+       "      <td>Yes</td>\n",
+       "      <td>Test Driver</td>\n",
+       "      <td>Freeway</td>\n",
+       "      <td>Lane change maneuver to the exit lane: risk of...</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "    Manufacturer Permit Number        DATE         VIN NUMBER  \\\n",
+       "0  AImotive Inc.        AVT003  12.06.2018  JTDKN3DU5A1092792   \n",
+       "1  AImotive Inc.        AVT003  12.10.2018  JTDKN3DU5A1092792   \n",
+       "2  AImotive Inc.        AVT003  12.10.2018  JTDKN3DU5A1092792   \n",
+       "3  AImotive Inc.        AVT003  04.23.2019  JTDKN3DU5A1092792   \n",
+       "4  AImotive Inc.        AVT003  05.14.2019  JTDKN3DU5A1092792   \n",
+       "\n",
+       "  OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY  \\\n",
+       "0                       No            Yes                Test Driver   \n",
+       "1                       No            Yes                Test Driver   \n",
+       "2                       No            Yes                Test Driver   \n",
+       "3                       No            Yes                Test Driver   \n",
+       "4                       No            Yes                Test Driver   \n",
+       "\n",
+       "  DISENGAGEMENT LOCATION                        FACTS CAUSING DISENGAGEMENT  \\\n",
+       "0                Freeway  Lane change maneuver: risk of lane departure, ...   \n",
+       "1                Freeway  Lane change maneuver: risk of lane departure, ...   \n",
+       "2                Freeway  Lane change maneuver: risk of lane departure, ...   \n",
+       "3                Freeway  Lane change maneuver: risk of lane departure, ...   \n",
+       "4                Freeway  Lane change maneuver to the exit lane: risk of...   \n",
+       "\n",
+       "   Problem class  \n",
+       "0              3  \n",
+       "1              3  \n",
+       "2              3  \n",
+       "3              6  \n",
+       "4              0  "
+      ]
+     },
+     "execution_count": 114,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_preprocessed.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 115,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "Index: 8884 entries, 0 to 8884\n",
+      "Data columns (total 10 columns):\n",
+      " #   Column                       Non-Null Count  Dtype \n",
+      "---  ------                       --------------  ----- \n",
+      " 0   Manufacturer                 8884 non-null   object\n",
+      " 1   Permit Number                8884 non-null   object\n",
+      " 2   DATE                         8884 non-null   object\n",
+      " 3   VIN NUMBER                   8884 non-null   object\n",
+      " 4   OPERATING WITHOUT DRIVER     8884 non-null   object\n",
+      " 5   DRIVER PRESENT               8884 non-null   object\n",
+      " 6   DISENGAGEMENT INITIATED BY   8884 non-null   object\n",
+      " 7   DISENGAGEMENT LOCATION       8884 non-null   object\n",
+      " 8   FACTS CAUSING DISENGAGEMENT  8884 non-null   object\n",
+      " 9   Problem class                8884 non-null   int64 \n",
+      "dtypes: int64(1), object(9)\n",
+      "memory usage: 1021.5+ KB\n"
+     ]
+    }
+   ],
+   "source": [
+    "data_preprocessed.info()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 116,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Problem class</th>\n",
+       "      <th>Manufacturer_Apple Inc.</th>\n",
+       "      <th>Manufacturer_Aurora Innovation, Inc.</th>\n",
+       "      <th>Manufacturer_AutoX Technologies, Inc.</th>\n",
+       "      <th>Manufacturer_BMW of North America</th>\n",
+       "      <th>Manufacturer_Baidu USA LLC</th>\n",
+       "      <th>Manufacturer_CRUISE LLC</th>\n",
+       "      <th>Manufacturer_Drive.ai Inc</th>\n",
+       "      <th>Manufacturer_Lyft</th>\n",
+       "      <th>Manufacturer_Mercedes-Benz Research &amp; Development North America, Inc.</th>\n",
+       "      <th>...</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, \\nother road user behaving poorly</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, AV made unsuccessful left turn</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, third party lane encroachment</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, third party lane obstruction</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV lane change issues</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV made unsuccessful left turn</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, other road user behaving poorly</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane encroachment</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane obstruction</th>\n",
+       "      <th>FACTS CAUSING DISENGAGEMENT_prediction discrepancy, a vehicle in the front was backing up, ego was not able to predict this behavior correctly.</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>3</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>...</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>3</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>...</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>3</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>...</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>6</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>...</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>0</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>...</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>5 rows × 4538 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   Problem class  Manufacturer_Apple Inc.  \\\n",
+       "0              3                    False   \n",
+       "1              3                    False   \n",
+       "2              3                    False   \n",
+       "3              6                    False   \n",
+       "4              0                    False   \n",
+       "\n",
+       "   Manufacturer_Aurora Innovation, Inc.  \\\n",
+       "0                                 False   \n",
+       "1                                 False   \n",
+       "2                                 False   \n",
+       "3                                 False   \n",
+       "4                                 False   \n",
+       "\n",
+       "   Manufacturer_AutoX Technologies, Inc.  Manufacturer_BMW of North America  \\\n",
+       "0                                  False                              False   \n",
+       "1                                  False                              False   \n",
+       "2                                  False                              False   \n",
+       "3                                  False                              False   \n",
+       "4                                  False                              False   \n",
+       "\n",
+       "   Manufacturer_Baidu USA LLC  Manufacturer_CRUISE LLC  \\\n",
+       "0                       False                    False   \n",
+       "1                       False                    False   \n",
+       "2                       False                    False   \n",
+       "3                       False                    False   \n",
+       "4                       False                    False   \n",
+       "\n",
+       "   Manufacturer_Drive.ai Inc  Manufacturer_Lyft  \\\n",
+       "0                      False              False   \n",
+       "1                      False              False   \n",
+       "2                      False              False   \n",
+       "3                      False              False   \n",
+       "4                      False              False   \n",
+       "\n",
+       "   Manufacturer_Mercedes-Benz Research & Development North America, Inc.  ...  \\\n",
+       "0                                              False                      ...   \n",
+       "1                                              False                      ...   \n",
+       "2                                              False                      ...   \n",
+       "3                                              False                      ...   \n",
+       "4                                              False                      ...   \n",
+       "\n",
+       "   FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, \\nother road user behaving poorly  \\\n",
+       "0                                              False                                                                                                           \n",
+       "1                                              False                                                                                                           \n",
+       "2                                              False                                                                                                           \n",
+       "3                                              False                                                                                                           \n",
+       "4                                              False                                                                                                           \n",
+       "\n",
+       "   FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, AV made unsuccessful left turn  \\\n",
+       "0                                              False                                                                                                        \n",
+       "1                                              False                                                                                                        \n",
+       "2                                              False                                                                                                        \n",
+       "3                                              False                                                                                                        \n",
+       "4                                              False                                                                                                        \n",
+       "\n",
+       "   FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, third party lane encroachment  \\\n",
+       "0                                              False                                                                                                       \n",
+       "1                                              False                                                                                                       \n",
+       "2                                              False                                                                                                       \n",
+       "3                                              False                                                                                                       \n",
+       "4                                              False                                                                                                       \n",
+       "\n",
+       "   FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \\nprecautionary takeover to address perception, third party lane obstruction  \\\n",
+       "0                                              False                                                                                                      \n",
+       "1                                              False                                                                                                      \n",
+       "2                                              False                                                                                                      \n",
+       "3                                              False                                                                                                      \n",
+       "4                                              False                                                                                                      \n",
+       "\n",
+       "   FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV lane change issues  \\\n",
+       "0                                              False                                               \n",
+       "1                                              False                                               \n",
+       "2                                              False                                               \n",
+       "3                                              False                                               \n",
+       "4                                              False                                               \n",
+       "\n",
+       "   FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV made unsuccessful left turn  \\\n",
+       "0                                              False                                                        \n",
+       "1                                              False                                                        \n",
+       "2                                              False                                                        \n",
+       "3                                              False                                                        \n",
+       "4                                              False                                                        \n",
+       "\n",
+       "   FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, other road user behaving poorly  \\\n",
+       "0                                              False                                                         \n",
+       "1                                              False                                                         \n",
+       "2                                              False                                                         \n",
+       "3                                              False                                                         \n",
+       "4                                              False                                                         \n",
+       "\n",
+       "   FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane encroachment  \\\n",
+       "0                                              False                                                       \n",
+       "1                                              False                                                       \n",
+       "2                                              False                                                       \n",
+       "3                                              False                                                       \n",
+       "4                                              False                                                       \n",
+       "\n",
+       "   FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane obstruction  \\\n",
+       "0                                              False                                                      \n",
+       "1                                              False                                                      \n",
+       "2                                              False                                                      \n",
+       "3                                              False                                                      \n",
+       "4                                              False                                                      \n",
+       "\n",
+       "   FACTS CAUSING DISENGAGEMENT_prediction discrepancy, a vehicle in the front was backing up, ego was not able to predict this behavior correctly.  \n",
+       "0                                              False                                                                                                \n",
+       "1                                              False                                                                                                \n",
+       "2                                              False                                                                                                \n",
+       "3                                              False                                                                                                \n",
+       "4                                              False                                                                                                \n",
+       "\n",
+       "[5 rows x 4538 columns]"
+      ]
+     },
+     "execution_count": 116,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_with_dummies = pd.get_dummies(data_preprocessed, drop_first=True)\n",
+    "data_with_dummies.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 117,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Problem class                                                                                                                                      0\n",
+       "Manufacturer_Apple Inc.                                                                                                                            0\n",
+       "Manufacturer_Aurora Innovation, Inc.                                                                                                               0\n",
+       "Manufacturer_AutoX Technologies, Inc.                                                                                                              0\n",
+       "Manufacturer_BMW of North America                                                                                                                  0\n",
+       "                                                                                                                                                  ..\n",
+       "FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV made unsuccessful left turn                                             0\n",
+       "FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, other road user behaving poorly                                            0\n",
+       "FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane encroachment                                              0\n",
+       "FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane obstruction                                               0\n",
+       "FACTS CAUSING DISENGAGEMENT_prediction discrepancy, a vehicle in the front was backing up, ego was not able to predict this behavior correctly.    0\n",
+       "Length: 4538, dtype: int64"
+      ]
+     },
+     "execution_count": 117,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_with_dummies.isnull().sum()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 118,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.metrics import confusion_matrix, accuracy_score, classification_report"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 119,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "target = data_with_dummies['Problem class']\n",
+    "inputs = data_with_dummies.drop(['Problem class'],axis=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 120,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x_train, x_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2, random_state=365)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 121,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "C:\\Users\\ar\\AppData\\Local\\Temp\\ipykernel_3376\\2234393116.py:1: UserWarning: \n",
+      "\n",
+      "`distplot` is a deprecated function and will be removed in seaborn v0.14.0.\n",
+      "\n",
+      "Please adapt your code to use either `displot` (a figure-level function with\n",
+      "similar flexibility) or `histplot` (an axes-level function for histograms).\n",
+      "\n",
+      "For a guide to updating your code to use the new functions, please see\n",
+      "https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751\n",
+      "\n",
+      "  sns.distplot((y_test),bins=50)\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<Axes: xlabel='Problem class', ylabel='Density'>"
+      ]
+     },
+     "execution_count": 121,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": "",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "sns.distplot((y_test),bins=50)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Decision Tree"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 124,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.tree import DecisionTreeClassifier"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 125,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dtree = DecisionTreeClassifier()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 126,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<style>#sk-container-id-1 {color: black;}#sk-container-id-1 pre{padding: 0;}#sk-container-id-1 div.sk-toggleable {background-color: white;}#sk-container-id-1 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-1 label.sk-toggleable__label-arrow:before {content: \"â–¸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-1 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-1 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"â–¾\";}#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-1 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-1 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-1 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-1 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-1 div.sk-item {position: relative;z-index: 1;}#sk-container-id-1 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-1 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-1 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-1 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-1 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-1 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-1 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-1 div.sk-label-container {text-align: center;}#sk-container-id-1 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>DecisionTreeClassifier()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">DecisionTreeClassifier</label><div class=\"sk-toggleable__content\"><pre>DecisionTreeClassifier()</pre></div></div></div></div></div>"
+      ],
+      "text/plain": [
+       "DecisionTreeClassifier()"
+      ]
+     },
+     "execution_count": 126,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dtree.fit(x_train,y_train)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 127,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "predictions = dtree.predict(x_test)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 128,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "              precision    recall  f1-score   support\n",
+      "\n",
+      "           0       0.99      0.99      0.99       531\n",
+      "           1       1.00      0.94      0.97        17\n",
+      "           2       0.99      0.98      0.98       527\n",
+      "           3       0.00      0.00      0.00         1\n",
+      "           4       0.79      0.96      0.87        24\n",
+      "           5       1.00      0.80      0.89        10\n",
+      "           6       0.99      0.99      0.99       667\n",
+      "\n",
+      "    accuracy                           0.99      1777\n",
+      "   macro avg       0.82      0.81      0.81      1777\n",
+      "weighted avg       0.99      0.99      0.99      1777\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(classification_report(y_test,predictions))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}