1. Business Understanding¶¶

Diese Fallstudie soll Technologieunternehmen bei der Entwicklung und Verbesserung von Komponenten für autonome Fahrzeuge unterstützen. Dabei wirft diese Fallstudie die Frage auf: In welchen Bereichen liegen die Schwächen der getesteten Systeme und wie lassen sich diese einordnen? Die Frage wird beantwortet, indem die Gründe für den Eingriff des menschlichen Testfahrers in das System klassifiziert werden. Dabei sollen die Gründe in verschiedene Kategorien eingeteilt werden:

  • Software Probleme
  • Hardware Probleme
  • Software und Hardware Probleme (Probleme, die in Kombination auftreten)
  • Probleme, die durch Objekte der Verkehrssteuerung verursacht werden (z. B. Fehlfunktionen von Lichtsignalanlagen)
  • Probleme, die durch andere Verkehrsteilnehmer verursacht werden
  • äußere Einflüsse (einschließlich z. B. Blockaden, verborgene Elemente, Wetter- und Straßenbedingungen)
  • sonstige Probleme

2. Daten und Datenverständnis¶

Der zugrundeliegende Datensatz wurde von Kaggle bezogen und zeigt notwendige Eingriffe eines Fahrers in autonom fahrende Fahrzeugtechnologie. Er wurde vom 1. Dezember 2018 bis zum 30. November 2019 aufgezeichnet und steht unter der Lizenz von U.S. Government Works. Die Datenverarbeitung erfolgt mit Hilfe eines Jupyter-Notebooks, das in der populärsten Data-Science-Plattform "Anaconda" enthalten ist. Der Datensatz liegt im Format 8885 x 9 vor. Die folgenden Daten wurden erfasst:

  • Hersteller
  • Zulassungsnummer
  • Datum
  • Fahrzeug-Identifikationsnummer
  • Möglichkeit des Betriebs ohne Fahrer
  • Ist ein Fahrer vor Ort?
  • Unterbrechung des autonomen Fahrens durch AV-System, Testfahrer, Fernsteuerung oder Beifahrer
  • Ort des Vorfalls: Autobahn, Schnellstraße, Landstraße, Straße oder Parkplatz
  • Beschreibung der Ursachen

2.1. Import von relevanten Modulen¶

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
In [2]:
pd.__version__
Out[2]:
'1.3.4'

2.2 Daten einlesen¶

In [3]:
raw_data = pd.read_csv('https://storage.googleapis.com/ml-service-repository-datastorage/Improvement_of_components_for_autonomous_motor_vehicles_data.csv')
In [4]:
raw_data.head()
Out[4]:
Manufacturer Permit Number DATE VIN NUMBER VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\n(Yes or No) DRIVER PRESENT\n(Yes or No) DISENGAGEMENT INITIATED BY\n(AV System, Test Driver, Remote Operator, or Passenger) DISENGAGEMENT\nLOCATION\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility) DESCRIPTION OF FACTS CAUSING DISENGAGEMENT
0 AImotive Inc. AVT003 12.06.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ...
1 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ...
2 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ...
3 AImotive Inc. AVT003 04.23.2019 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ...
4 AImotive Inc. AVT003 05.14.2019 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver to the exit lane: risk of...

2.3. Daten bereinigen¶

In [5]:
raw_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8885 entries, 0 to 8884
Data columns (total 9 columns):
 #   Column                                                                                          Non-Null Count  Dtype 
---  ------                                                                                          --------------  ----- 
 0   Manufacturer                                                                                    8885 non-null   object
 1   Permit Number                                                                                   8885 non-null   object
 2   DATE                                                                                            8884 non-null   object
 3   VIN NUMBER                                                                                      8884 non-null   object
 4   VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER
(Yes or No)                                    8884 non-null   object
 5   DRIVER PRESENT
(Yes or No)                                                                      8884 non-null   object
 6   DISENGAGEMENT INITIATED BY
(AV System, Test Driver, Remote Operator, or Passenger)              8884 non-null   object
 7   DISENGAGEMENT
LOCATION
(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility)  8884 non-null   object
 8   DESCRIPTION OF FACTS CAUSING DISENGAGEMENT                                                      8884 non-null   object
dtypes: object(9)
memory usage: 624.9+ KB
In [6]:
raw_data.describe(include="all")
Out[6]:
Manufacturer Permit Number DATE VIN NUMBER VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\n(Yes or No) DRIVER PRESENT\n(Yes or No) DISENGAGEMENT INITIATED BY\n(AV System, Test Driver, Remote Operator, or Passenger) DISENGAGEMENT\nLOCATION\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility) DESCRIPTION OF FACTS CAUSING DISENGAGEMENT
count 8885 8885 8884 8884 8884 8884 8884 8884 8884
unique 28 27 3711 289 5 4 4 11 469
top Toyota Research Institute AVT050 3/28/2019 JTHDU1EF3G5020098 No Yes Test Driver Street Safety Driver proactive disengagement.
freq 2947 2947 59 900 4369 4934 6037 4668 1780
In [7]:
raw_data.rename(columns = {'VEHICLE IS CAPABLE OF OPERATING WITHOUT A DRIVER\n(Yes or No)':'OPERATING WITHOUT DRIVER', 'DRIVER PRESENT\n(Yes or No)':'DRIVER PRESENT', 'DISENGAGEMENT INITIATED BY\n(AV System, Test Driver, Remote Operator, or Passenger)':'DISENGAGEMENT INITIATED BY', 'DISENGAGEMENT\nLOCATION\n(Interstate, Freeway, Highway, Rural Road, Street, or Parking Facility)':'DISENGAGEMENT LOCATION', 'DESCRIPTION OF FACTS CAUSING DISENGAGEMENT':'FACTS CAUSING DISENGAGEMENT'}, inplace = True) 

Fehlende Werte¶

In [8]:
raw_data.isnull().sum()
Out[8]:
Manufacturer                   0
Permit Number                  0
DATE                           1
VIN NUMBER                     1
OPERATING WITHOUT DRIVER       1
DRIVER PRESENT                 1
DISENGAGEMENT INITIATED BY     1
DISENGAGEMENT LOCATION         1
FACTS CAUSING DISENGAGEMENT    1
dtype: int64
In [9]:
data_no_mv=raw_data.dropna(axis=0)
data_no_mv.isnull().sum()
Out[9]:
Manufacturer                   0
Permit Number                  0
DATE                           0
VIN NUMBER                     0
OPERATING WITHOUT DRIVER       0
DRIVER PRESENT                 0
DISENGAGEMENT INITIATED BY     0
DISENGAGEMENT LOCATION         0
FACTS CAUSING DISENGAGEMENT    0
dtype: int64
In [10]:
data_no_mv.describe(include='all')
Out[10]:
Manufacturer Permit Number DATE VIN NUMBER OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY DISENGAGEMENT LOCATION FACTS CAUSING DISENGAGEMENT
count 8884 8884 8884 8884 8884 8884 8884 8884 8884
unique 27 26 3711 289 5 4 4 11 469
top Toyota Research Institute AVT050 3/28/2019 JTHDU1EF3G5020098 No Yes Test Driver Street Safety Driver proactive disengagement.
freq 2947 2947 59 900 4369 4934 6037 4668 1780

Datenaufbereitung und Kennzeichnung¶

In [11]:
data_no_mv['DISENGAGEMENT INITIATED BY'].value_counts()
Out[11]:
Test Driver         6037
AV System           2698
Vehicle Operator      81
Test driver           68
Name: DISENGAGEMENT INITIATED BY, dtype: int64
In [12]:
data_no_mv['FACTS CAUSING DISENGAGEMENT'].nunique()
Out[12]:
469
In [13]:
data_no_mv.head()
Out[13]:
Manufacturer Permit Number DATE VIN NUMBER OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY DISENGAGEMENT LOCATION FACTS CAUSING DISENGAGEMENT
0 AImotive Inc. AVT003 12.06.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ...
1 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ...
2 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ...
3 AImotive Inc. AVT003 04.23.2019 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ...
4 AImotive Inc. AVT003 05.14.2019 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver to the exit lane: risk of...
In [14]:
data_no_mv['FACTS CAUSING DISENGAGEMENT'].value_counts()
Out[14]:
Safety Driver proactive disengagement.                                                                                                                                  1780
Disengage for unwanted maneuver of the vehicle caused by a planning discrepancy while generating an appropriate trajectory                                               805
Automatic disengagement caused by planner fault.                                                                                                                         742
Disengage due to operator discomfort                                                                                                                                     636
Disengage for a software fault due to a potential performance issue with a software component of the self-driving system (including third party software components)     482
                                                                                                                                                                        ... 
Planning Logic: planner inadequately yields for cross traffic agent with right-of-way                                                                                      1
Planning Logic: planned trajectory fails to avoid vehicle stopped ahead intersection                                                                                       1
Object Perception: inaccurate perception of animal slowly crossing road leads to planned trajectory overlap                                                                1
Planning Logic: incorrect behavior prediction for oncoming vehicle results in a planned trajectory that overlaps with the vehicle                                          1
Planning discrepancy; system planned incorrect trajectory to avoid oncoming traffic                                                                                        1
Name: FACTS CAUSING DISENGAGEMENT, Length: 469, dtype: int64
In [15]:
data_no_mv['DISENGAGEMENT INITIATED BY'].value_counts()
Out[15]:
Test Driver         6037
AV System           2698
Vehicle Operator      81
Test driver           68
Name: DISENGAGEMENT INITIATED BY, dtype: int64
In [16]:
data_reduced = data_no_mv.loc[0:1999, :]
In [17]:
data_reduced.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2000 entries, 0 to 1999
Data columns (total 9 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   Manufacturer                 2000 non-null   object
 1   Permit Number                2000 non-null   object
 2   DATE                         2000 non-null   object
 3   VIN NUMBER                   2000 non-null   object
 4   OPERATING WITHOUT DRIVER     2000 non-null   object
 5   DRIVER PRESENT               2000 non-null   object
 6   DISENGAGEMENT INITIATED BY   2000 non-null   object
 7   DISENGAGEMENT LOCATION       2000 non-null   object
 8   FACTS CAUSING DISENGAGEMENT  2000 non-null   object
dtypes: object(9)
memory usage: 156.2+ KB
In [18]:
data_reduced['FACTS CAUSING DISENGAGEMENT'].nunique()
Out[18]:
92
In [19]:
data_reduced.drop_duplicates(subset ="FACTS CAUSING DISENGAGEMENT", 
                     keep = 'first', inplace = True) 
/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/pandas/util/_decorators.py:311: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)
In [20]:
data_reduced = data_reduced.reset_index()
In [21]:
data_reduced['FACTS CAUSING DISENGAGEMENT'] = data_reduced['FACTS CAUSING DISENGAGEMENT'].astype('str')
In [22]:
data_reduced['FACTS CAUSING DISENGAGEMENT'].dtype
Out[22]:
dtype('O')
In [23]:
import re
In [24]:
software_keywords = ["software performance", "software fault", "software discrepancy"
                     "trajectory planning", "planning discrepancy", "planning error",
                     "wrong lane change suggestion", "wrong lane association", "data recording",
                     "improper lane-change plan", "undesirable manuever", "undesirable yielding maneuver",
                     "outside of rate requirements", "merged poorly", "mapping issue", "software issue",
                    "poor trajectory across lanes", "incorrect assessment", "incorrect behavior",
                    "unprotected", "Poor lane change", "very wide", "wrong object prediction", "undesired motion",
                    "unwanted maneuver", "perception discrepancy", "ghost object prediction",
                    "driving faster than driver expected", "expected path",
                    "not initialized correctly", "software module", "perception mismatch", "estimation",
                    "planner fault", "unstable"]
In [25]:
data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(software_keywords), na=False, case=False)
                  , 'Problem class'] = "Software" 
/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py:1684: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = infer_fill_value(value)
/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py:1817: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)
In [26]:
hardware_keywords = ["hardware performance", "hardware diagnostics", "controls diagnostics", "actor", 
                    "yield to other actors", "Hardware irregularity", "weather conditions", "Autobox", 
                    "performance issue with a hardware component"]
In [27]:
data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(hardware_keywords), na=False, case=False)
                  , 'Problem class'] = "Hardware" 
In [28]:
soft_hardware_keywords = ["timed out", "timeout", "too long", "incorrect detection", "Lost track", "Localization", 
                         "geo-location related", "unsuccessful right turn", "unsuccessful left turn", "system",
                         "traffic conditions", "failed to detect an object correctly", " took longer than expected",
                         "main computer froze", "not braking correctly", "not speeding up correctly",
                         "not turning enough", "not slowing down enough", "didn't detect", "Sensor Fusion discrepancy",
                         "did not meet expectation"]
In [29]:
data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(soft_hardware_keywords), na=False, case=False)
                  , 'Problem class'] = "Software/Hardware" 
In [30]:
Traffic_ctrl_keywords = ["unstable target lane", "Traffic light error", "Stop sign error"]
In [31]:
data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(Traffic_ctrl_keywords), na=False, case=False)
                  , 'Problem class'] = "Traffic control objects" 
In [32]:
road_user_keywords = ["reckless driver", "behaving road user", "other road user"]
In [33]:
data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(road_user_keywords), na=False, case=False)
                  , 'Problem class'] = "Other road user" 
In [34]:
external_influences = ["obstruction", "encroachment", "occluded view", "surface conditions", "wheater"]
In [35]:
data_no_mv.loc[data_no_mv['FACTS CAUSING DISENGAGEMENT'].str.contains('|'.join(external_influences), na=False, case=False)
                  , 'Problem class'] = "External influences" 
In [36]:
data_no_mv.loc[data_no_mv['Problem class'].isnull()
                  , 'Problem class'] = "Other problems" 
In [37]:
data_no_mv.head()
Out[37]:
Manufacturer Permit Number DATE VIN NUMBER OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY DISENGAGEMENT LOCATION FACTS CAUSING DISENGAGEMENT Problem class
0 AImotive Inc. AVT003 12.06.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ... Traffic control objects
1 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ... Traffic control objects
2 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ... Traffic control objects
3 AImotive Inc. AVT003 04.23.2019 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ... Other problems
4 AImotive Inc. AVT003 05.14.2019 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver to the exit lane: risk of... Software
In [38]:
from collections import Counter
plt.title("Problems")
plt.xticks(rotation = 45)
plt.bar(dict(Counter(data_reduced_classified['Problem class'])).keys(), dict(Counter(data_reduced_classified['Problem class'])).values())
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/var/folders/kh/bds3ggxd09gbnhkp6c414s9h0000gn/T/ipykernel_20061/3854068531.py in <module>
      2 plt.title("Problems")
      3 plt.xticks(rotation = 45)
----> 4 plt.bar(dict(Counter(data_reduced_classified['Problem class'])).keys(), dict(Counter(data_reduced_classified['Problem class'])).values())

NameError: name 'data_reduced_classified' is not defined
No description has been provided for this image

Klassifikationsmodell¶

In [39]:
data_preprocessed = data_no_mv.copy()
data_preprocessed['Problem class']=data_preprocessed['Problem class'].map(
    {'Software':0,'Hardware':1,'Software/Hardware':2,
    'Traffic control objects':3,'Other road user':4, 'External influences':5,
     'Other problems':6})
In [40]:
data_preprocessed.head()
Out[40]:
Manufacturer Permit Number DATE VIN NUMBER OPERATING WITHOUT DRIVER DRIVER PRESENT DISENGAGEMENT INITIATED BY DISENGAGEMENT LOCATION FACTS CAUSING DISENGAGEMENT Problem class
0 AImotive Inc. AVT003 12.06.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ... 3
1 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ... 3
2 AImotive Inc. AVT003 12.10.2018 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ... 3
3 AImotive Inc. AVT003 04.23.2019 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver: risk of lane departure, ... 6
4 AImotive Inc. AVT003 05.14.2019 JTDKN3DU5A1092792 No Yes Test Driver Freeway Lane change maneuver to the exit lane: risk of... 0
In [41]:
data_preprocessed.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 8884 entries, 0 to 8884
Data columns (total 10 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   Manufacturer                 8884 non-null   object
 1   Permit Number                8884 non-null   object
 2   DATE                         8884 non-null   object
 3   VIN NUMBER                   8884 non-null   object
 4   OPERATING WITHOUT DRIVER     8884 non-null   object
 5   DRIVER PRESENT               8884 non-null   object
 6   DISENGAGEMENT INITIATED BY   8884 non-null   object
 7   DISENGAGEMENT LOCATION       8884 non-null   object
 8   FACTS CAUSING DISENGAGEMENT  8884 non-null   object
 9   Problem class                8884 non-null   int64 
dtypes: int64(1), object(9)
memory usage: 1021.5+ KB
In [42]:
data_with_dummies = pd.get_dummies(data_preprocessed, drop_first=True)
data_with_dummies.head()
Out[42]:
Problem class Manufacturer_Apple Inc. Manufacturer_Aurora Innovation, Inc. Manufacturer_AutoX Technologies, Inc. Manufacturer_BMW of North America Manufacturer_Baidu USA LLC Manufacturer_CRUISE LLC Manufacturer_Drive.ai Inc Manufacturer_Lyft Manufacturer_Mercedes-Benz Research & Development North America, Inc. ... FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \nprecautionary takeover to address perception, \nother road user behaving poorly FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \nprecautionary takeover to address perception, AV made unsuccessful left turn FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \nprecautionary takeover to address perception, third party lane encroachment FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, \nprecautionary takeover to address perception, third party lane obstruction FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV lane change issues FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV made unsuccessful left turn FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, other road user behaving poorly FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane encroachment FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane obstruction FACTS CAUSING DISENGAGEMENT_prediction discrepancy, a vehicle in the front was backing up, ego was not able to predict this behavior correctly.
0 3 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 3 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 3 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 6 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 4538 columns

In [43]:
data_with_dummies.isnull().sum()
Out[43]:
Problem class                                                                                                                                      0
Manufacturer_Apple Inc.                                                                                                                            0
Manufacturer_Aurora Innovation, Inc.                                                                                                               0
Manufacturer_AutoX Technologies, Inc.                                                                                                              0
Manufacturer_BMW of North America                                                                                                                  0
                                                                                                                                                  ..
FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, AV made unsuccessful left turn                                             0
FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, other road user behaving poorly                                            0
FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane encroachment                                              0
FACTS CAUSING DISENGAGEMENT_precautionary takeover to address planning, third party lane obstruction                                               0
FACTS CAUSING DISENGAGEMENT_prediction discrepancy, a vehicle in the front was backing up, ego was not able to predict this behavior correctly.    0
Length: 4538, dtype: int64
In [44]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
In [45]:
target = data_with_dummies['Problem class']
inputs = data_with_dummies.drop(['Problem class'],axis=1)
In [46]:
x_train, x_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2, random_state=365)
In [47]:
sns.distplot((y_test),bins=50)
/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
Out[47]:
<AxesSubplot:xlabel='Problem class', ylabel='Density'>
No description has been provided for this image

K-Nearest Neighbors¶

In [48]:
from sklearn.neighbors import KNeighborsClassifier
In [49]:
error_rate = []
 
for i in range(1,10):
    
    KNN_model = KNeighborsClassifier(n_neighbors=i)
    KNN_model.fit(x_train,y_train)
    KNN_prediction = KNN_model.predict(x_test)
    error_rate.append(np.mean(KNN_prediction != y_test)) #nur Fehler berücksichtigen

plt.figure(figsize=(10,6))
plt.plot(range(1,10),error_rate,color='blue', linestyle='dashed', marker='o',
         markerfacecolor='red', markersize=10)
plt.title('Error Rate vs. K Values')
plt.xlabel('K')
plt.ylabel('Error Rate')
Out[49]:
Text(0, 0.5, 'Error Rate')
No description has been provided for this image
In [50]:
KNN_model = KNeighborsClassifier(n_neighbors=3)
In [51]:
KNN_model.fit(x_train, y_train)
Out[51]:
KNeighborsClassifier(n_neighbors=3)
In [52]:
KNN_prediction = KNN_model.predict(x_test)
In [53]:
accuracy_score(KNN_prediction, y_test)
Out[53]:
0.9712999437253799
In [54]:
print(classification_report(KNN_prediction, y_test))
              precision    recall  f1-score   support

           0       0.98      0.97      0.97       538
           1       0.82      0.82      0.82        17
           2       0.97      0.97      0.97       530
           3       0.00      0.00      0.00         0
           4       0.83      0.74      0.78        27
           5       0.70      1.00      0.82         7
           6       0.98      0.99      0.98       658

    accuracy                           0.97      1777
   macro avg       0.76      0.78      0.77      1777
weighted avg       0.97      0.97      0.97      1777

/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1308: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1308: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/yassermurtada/opt/anaconda3/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1308: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

Entscheidungsbaum¶

In [55]:
from sklearn.tree import DecisionTreeClassifier
In [56]:
dtree = DecisionTreeClassifier()
In [57]:
dtree.fit(x_train,y_train)
Out[57]:
DecisionTreeClassifier()
In [58]:
predictions = dtree.predict(x_test)
In [59]:
print(classification_report(y_test,predictions))
              precision    recall  f1-score   support

           0       0.99      0.99      0.99       531
           1       0.94      1.00      0.97        17
           2       0.99      0.98      0.99       527
           3       1.00      1.00      1.00         1
           4       0.79      0.96      0.87        24
           5       1.00      0.80      0.89        10
           6       0.99      0.99      0.99       667

    accuracy                           0.99      1777
   macro avg       0.96      0.96      0.96      1777
weighted avg       0.99      0.99      0.99      1777

In [ ]: