Merge branch 'improvement_ClothingClassification' into 'main'

Notebook tagged and description added See merge request ki_lab/machine-learning-services!16

Merge branch 'improvement_ClothingClassification' into 'main'
251f8de6 · Andreas Buzer · 2e41ca83 · bd387f2d · 251f8de6
Commit 251f8de6 authored Jun 19, 2024 by Andreas Buzer
--- a/Warehouse/Classification of clothing through images/notebook.ipynb
+++ b/Warehouse/Classification of clothing through images/notebook.ipynb
@@ -2,41 +2,81 @@
 "cells": [
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
   "source": [
-    "# 1. Business Understanding"
+    "## 1. Business Understanding"
   ]
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Business",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": [
+     "business"
+    ]
+   },
+   "source": [
+    "Viele Online-Versandhandelsunternehmen haben eine hohe Rücksendequote (von bis zu 50%), wobei 97% aller zurückgesendeten Produkte wieder auf Lager genommen und verkauft werden können. Um die Waren wieder zu verkaufen, müssen sie entsprechend identifiziert, etikettiert und wieder eingelagert werden.\n",
+    "\n",
+    "Angenommen, dass im Jahr 2020 185,5 Millionen Bestellungen (Statista, 2021) mit jeweils 6 Artikeln (Annahme) eingehen würden, dann würde eine Rücksendequote von 50% bedeuten, dass 556,5 Millionen Artikel neu identifiziert und kategorisiert werden müssten.\n",
+    "\n",
+    "Um diesen Prozess zu unterstützen und die Identifizierung der zurückgesendeten Kleidungsstücke zu erleichtern, soll eine Bilderkennungssoftware entwickelt werden, die die zugehörigen Kategorien der einzelnen Kleidungsstücke anhand von Bildern erkennt."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
   "source": [
-    "Many online mail order companies have a high return rate\n",
+    "## 2. Data Understanding"
-    "(of up to 50%), with 97% of all returned products being able to be restocked\n",
-    "and can be sold. In order to resell the goods\n",
-    "goods, they must be identified, labeled, and restocked accordingly.\n",
-    "again.\n",
-    "Assuming that in 2020 185.5 million orders (Statista, 2021) with\n",
-    "6 items each (acceptance) would be received, then a return rate of 50% would mean that\n",
-    "of 50%, 556.5 million items would have to be re-identified and re-categorized.\n",
-    "To support this process and to facilitate identification of the\n",
-    "returned garments, image recognition software is to be developed that will\n",
-    "the associated categories of the individual garments on the basis of images.\n",
-    "of the individual garments on the basis of images."
   ]
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Daten",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": [
+     "daten"
+    ]
+   },
   "source": [
-    "# 2. Read Data"
+    "Der Datenrahmen **fashion-mnist_test** stammt von Kaggle und ist über das [Fashion-MNIST GitHub Repository](https://github.com/zalandoresearch/fashion-mnist) zugänglich. Dieser Datensatz wurde 2017 von Zalando erstellt und besteht aus Artikelbildern des Unternehmens.\n",
+    "\n",
+    "Der Datensatz liegt im CSV-Format vor und enthält insgesamt 70.000 Bilder von Kleidungsstücken, die in 60.000 Trainingsbilder und 10.000 Testbilder unterteilt sind. Jedes Bild wurde auf eine Größe von 28x28 Pixel skaliert und in Graustufen umgewandelt.\n",
+    "\n",
+    "Der Datensatz umfasst 784 Merkmale, die jeweils einem Pixel des Bildes entsprechen, sowie ein zusätzliches Label, das die Kategorie des Kleidungsstücks angibt. Sowohl die Merkmale als auch die Labels sind als Integer-Werte gespeichert. Die Pixelwerte repräsentieren die Intensität des Grautons, während die Labels die verschiedenen Kleidungsstückkategorien darstellen.\n",
+    "\n",
+    "Insgesamt enthält der Datensatz 70.000 Beobachtungen. Die Parameter \"Standort\", Verteilungsparameter und Korrelationsanalyse sind für diesen Datensatz nicht anwendbar.\n",
+    "\n",
+    "Der Fashion-MNIST-Datensatz bietet eine moderne und herausfordernde Alternative zum klassischen MNIST-Datensatz, da er realistischere und komplexere Bilder von Kleidungsstücken enthält. Dies stellt eine größere Herausforderung für Bildklassifizierungsmodelle dar und bietet eine realistischere Anwendungsmöglichkeit im Bereich der maschinellen Bildverarbeitung."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## 2.1. Import of Relavant Modules "
+    "### 2.1. Import von relevanten Modulen"
   ]
  },
  {
@@ -130,14 +170,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## 2.2 Read Data\n"
+    "### 2.2 Daten einlesen\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The training- and test-data is already labeled and split up into two datasets"
+    "Die Trainings- und Testdaten sind bereits aufgeteilt in zwei verschiedene CSV Dateien"
   ]
  },
  {
@@ -374,11 +414,12 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Describe the dataframe, not really helpful in this case, but is shows that the data is not corrupted by evaluating:\n",
+    "Das Beschreiben des Datenrahmens ist in diesem Fall nicht wirklich hilfreich, aber zeigt, dass die Daten nicht beschädigt sind, indem folgende Punkte überprüft werden:\n",
-    "- Label must be between 0 and 9\n",
+    "\n",
-    "- Pixel values must be between 0 and 255 (non-negative)\n",
+    "- Das Label muss zwischen 0 und 9 liegen.\n",
-    "- Count must be 60000 (train) 10000 (test)\n",
+    "- Die Pixelwerte müssen zwischen 0 und 255 (nicht negativ) liegen.\n",
-    "- Maximum number of pixels must be 784 for all rows"
+    "- Die Anzahl muss 60.000 (Training) und 10.000 (Test) betragen.\n",
+    "- Die maximale Anzahl an Pixeln muss für alle Zeilen 784 betragen."
   ]
  },
  {
@@ -1005,14 +1046,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Both test data and train data seems to be valid and uncorrupted"
+    "Sowohl die Testdaten als auch die Trainingsdaten scheinen gültig und unbeschädigt zu sein."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Define human readable names for the 10 categories"
+    "Für Menschen lesbare Namen für 10 Kategorien definieren."
   ]
  },
  {
@@ -1029,7 +1070,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Split the dataset an check the distribution of each class (split could be also done later)"
+    "Teile den Datensatz und überprüfe die Verteilung jeder Klasse (das Teilen kann auch später erfolgen)."
   ]
  },
  {
@@ -1058,7 +1099,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Show the distribution for each set"
+    "Zeige die Verteilung für jedes Datenset"
   ]
  },
  {
@@ -1140,8 +1181,8 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This is already helpful, we can see that it is quite evenly split.\n",
+    "Das ist bereits hilfreich, wir können sehen, dass die Verteilung ziemlich gleichmäßig ist.\n",
-    "Print the data as a pie chart, to make it even nicer."
+    "Drucke die Daten als Tortendiagramm aus, um es noch ansprechender zu gestalten."
   ]
  },
  {
@@ -1203,7 +1244,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Print a single image"
+    "Ein einziges Bild ausgeben"
   ]
  },
  {
@@ -1243,7 +1284,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Print one image from each category to see how they look like and how they differ"
+    "Drucken Sie jeweils ein Bild aus jeder Kategorie, um zu sehen, wie sie aussehen und wie sie sich unterscheiden."
   ]
  },
  {
@@ -1284,16 +1325,37 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
   "source": [
-    "# 3. Data preperation"
+    "## 3. Datenvorbereitung"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenvorbereitung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "Die Data Preparation Phase beginnt mit der Aufteilung der Daten in Trainings-, Validierungs- und Testsets sowie der Skalierung der Merkmale. Anschließend werden die Bildformate von 784 auf 28x28 umgewandelt (falls als CSV mit 784 Spalten geladen) und die Labels in ein kategorisches Format konvertiert. Die Überprüfung der Datenformate gewährleistet, dass alle Daten im korrekten Format vorliegen."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## 3.1. Test and Train Data"
+    "### 3.1. Test- und Trainingsdaten"
   ]
  },
  {
@@ -1317,7 +1379,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## 3.2. Feature Scaling"
+    "### 3.2. Merkmalsskalierung"
   ]
  },
  {
@@ -1335,7 +1397,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Convert the image shape from 784 to 28x28 (only if load as CSV with 784 columns)"
+    "Wandeln Sie die Bildform von 784 auf 28x28 um (nur wenn sie als CSV mit 784 Spalten geladen wurden)."
   ]
  },
  {
@@ -1356,7 +1418,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## 3.3. Convert Labels"
+    "### 3.3. Labels konvertieren"
   ]
  },
  {
@@ -1374,7 +1436,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Check the data shapes to get ensure that the data is in the correct format"
+    "Überprüfen Sie die Datenformate, um sicherzustellen, dass die Daten im richtigen Format vorliegen."
   ]
  },
  {
@@ -1408,29 +1470,32 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
   "source": [
-    "# 4. Modelling and Evaluation"
+    "# 4. Datenmodell"
   ]
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Datenmodell",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": [
+     "datenmodell"
+    ]
+   },
   "source": [
-    "Define how the model will look like. Below some descriptions for different Layer types.\n",
+    "In der Modellierungsphase wird die Architektur des Modells definiert, die verschiedene Arten von Schichten umfasst. Zunächst wurde ein einfaches DNN mit dichten Schichten ausprobiert, jedoch konnten verbesserte Ergebnisse erzielt werden, indem auf eine CNN-Architektur umgestellt wurde. Als Ausgangsarchitektur wurde die Implementierung von LeNet-5 gewählt und angepasst. Die Hyperparameter wurden mithilfe eines Keras-Optimierers optimiert, der verschiedene Kombinationen ausprobierte, um die aktuellen Parameter auszuwählen. Das Modell umfasst Schichten wie Dense für die Berechnung des Skalarprodukts, Dropout zur Vermeidung von Überanpassung, Flatten zum Flachlegen von Matrizen, MaxPooling2D zur Reduzierung der Eingabedimensionen und Conv2D für Faltungsoperationen auf den Eingabedaten."
-    "The first trial of this model was just a DNN with simple dense layers, but the results can be improved by using a CNN.\n",
-    "As a start architecture the LeNet-5 implementation was chosen and then altered.\n",
-    "The hyperparameters could also be optimized with a Keras Optimizer which tries out several defined combinations.\n",
-    "The current parameters got chosen by exploration.\n",
-    "\n",
-    "- Dense: receives all inputs from previous layer, creates dot product \n",
-    "- Dropout layer: removes noise for overfitting, drops at specific rate\n",
-    "- Reshape layer: changes the shape of the input, not used\n",
-    "- Permute layer: alter shape of the input, not used\n",
-    "- ReapeatVector layer: repeats the input for fixed number of times, not used\n",
-    "- Flatten Layer: flattens the matrix\n",
-    "- MaxPooling2D Layer: reduces number of input\n",
-    "- Conv2D Layer: convolves an input"
   ]
  },
  {
@@ -1560,7 +1625,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Compile the model"
+    "Modell kompilieren"
   ]
  },
  {
@@ -1580,7 +1645,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Train/Fit the model"
+    "Trainieren und fitten des Modells"
   ]
  },
  {
@@ -1647,7 +1712,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We want do know how especially the loss changes over time"
+    "Wir wollen besonders wissen wie sich der Loss (Verlust) über die Zeit entwickelt"
   ]
  },
  {
@@ -1701,14 +1766,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We can see that the model is performing quite well, but after the second epoch it starts to overfit. To prevent that we could try with different train-validation splits, add more dropout or restructure parts of the model."
+    "Es ist zu erkennen, dass das Modell recht gut funktioniert, aber nach der zweiten Epoche beginnt es zu überanpassen. Um dies zu verhindern, könnten wir verschiedene Trainings-Validierungs-Splits ausprobieren, mehr Dropouts hinzufügen oder Teile des Modells umstrukturieren."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Evaluate the test-data"
+    "Testdaten evaluieren"
   ]
  },
  {
@@ -1741,14 +1806,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The model performs quite well with an accuracy of > 90% on the test-data. The loss is acceptable."
+    "Das Modell schneidet mit einer Genauigkeit von > 90 % bei den Testdaten recht gut ab. Der Verlust ist akzeptabel."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Show results by class"
+    "Zeige Resultate nach Klasse an"
   ]
  },
  {
@@ -1809,9 +1874,73 @@
   "source": [
    "print(classification_report(y_test, predicted_classes, target_names=class_names))"
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "## 5. Evaluation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Evaluation",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": [
+     "evaluation"
+    ]
+   },
+   "source": [
+    "Nach dem Training und der Validierung des Modells wird die Leistung anhand der Testdaten ausgewertet. Dabei erreicht das Modell eine Testgenauigkeit von über 90 %, was darauf hinweist, dass es gut in der Lage ist, neue, bisher ungesehene Daten zu klassifizieren. Der Testverlust ist ebenfalls akzeptabel niedrig, was darauf hinweist, dass das Modell Vorhersagen nahe an den tatsächlichen Daten trifft. Zusätzlich zeigen klassenspezifische Metriken wie Präzision, Rückruf und F1-Score für jede Klasse (z.B. Top, Trouser, Pullover usw.), dass das Modell gute bis sehr gute Ergebnisse erzielt, insbesondere bei Klassen wie Sneaker, Bag und Ankle boot. Diese Ergebnisse bieten eine umfassende Bewertung der Modellleistung und helfen dabei, seine Eignung für praktische Anwendungen zu beurteilen."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "## 6. Umsetzung"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Umsetzung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": [
+     "umsetzung"
+    ]
+   },
+   "source": [
+    "Im Rahmen des CRISP-DM Zyklus stellt das Deployment den letzten Schritt dar, bei dem das trainierte Modell für den produktiven Einsatz vorbereitet wird. Dies beinhaltet die Implementierung des Modells in eine Produktionsumgebung, wo es Echtzeitdaten verarbeiten kann. Vor dem Deployment müssen alle Aspekte wie Modellperformance auf Testdaten, Sicherstellung der Skalierbarkeit und Integration in bestehende Systeme sorgfältig überprüft werden. Zudem ist es entscheidend, fortlaufende Überwachung und Wartung sicherzustellen, um die langfristige Leistung und Genauigkeit des Modells zu gewährleisten."
+   ]
  }
 ],
 "metadata": {
+  "branche": "Handel",
+  "category": "Warehouse",
+  "dataSource": "https://github.com/zalandoresearch/fashion-mnist",
+  "funktion": "Logistik",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
@@ -1827,8 +1956,12 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.5"
+   "version": "3.11.7"
-  }
+  },
+  "repoLink": "https://gitlab.reutlingen-university.de/ki_lab/machine-learning-services/-/tree/main/Warehouse/Classification%20of%20clothing%20through%20images?ref_type=heads",
+  "skipNotebookInDeployment": false,
+  "teaser": "In diesem Praxisbeispiel können Sie mithilfe des Machine Learning Models einen Dienst entwickeln, der Retouren automatisch identifiziert und etikettiert, um sie effizient wieder aufzufüllen und weiterzuverkaufen. ",
+  "title": "Klassifizierung von Klamotten durch Bilder"
 },
 "nbformat": 4,
 "nbformat_minor": 4

 %% Cell type:markdown id: tags:
-# 1. Business Understanding
+## 1. Business Understanding
-%% Cell type:markdown id: tags:
+%% Cell type:markdown id: tags:business
+Viele Online-Versandhandelsunternehmen haben eine hohe Rücksendequote (von bis zu 50%), wobei 97% aller zurückgesendeten Produkte wieder auf Lager genommen und verkauft werden können. Um die Waren wieder zu verkaufen, müssen sie entsprechend identifiziert, etikettiert und wieder eingelagert werden.
+Angenommen, dass im Jahr 2020 185,5 Millionen Bestellungen (Statista, 2021) mit jeweils 6 Artikeln (Annahme) eingehen würden, dann würde eine Rücksendequote von 50% bedeuten, dass 556,5 Millionen Artikel neu identifiziert und kategorisiert werden müssten.
-Many online mail order companies have a high return rate
+Um diesen Prozess zu unterstützen und die Identifizierung der zurückgesendeten Kleidungsstücke zu erleichtern, soll eine Bilderkennungssoftware entwickelt werden, die die zugehörigen Kategorien der einzelnen Kleidungsstücke anhand von Bildern erkennt.
-(of up to 50%), with 97% of all returned products being able to be restocked
-and can be sold. In order to resell the goods
-goods, they must be identified, labeled, and restocked accordingly.
-again.
-Assuming that in 2020 185.5 million orders (Statista, 2021) with
-6 items each (acceptance) would be received, then a return rate of 50% would mean that
-of 50%, 556.5 million items would have to be re-identified and re-categorized.
-To support this process and to facilitate identification of the
-returned garments, image recognition software is to be developed that will
-the associated categories of the individual garments on the basis of images.
-of the individual garments on the basis of images.
 %% Cell type:markdown id: tags:
-# 2. Read Data
+## 2. Data Understanding
+%% Cell type:markdown id: tags:daten
+Der Datenrahmen **fashion-mnist_test** stammt von Kaggle und ist über das [Fashion-MNIST GitHub Repository](https://github.com/zalandoresearch/fashion-mnist) zugänglich. Dieser Datensatz wurde 2017 von Zalando erstellt und besteht aus Artikelbildern des Unternehmens.
+Der Datensatz liegt im CSV-Format vor und enthält insgesamt 70.000 Bilder von Kleidungsstücken, die in 60.000 Trainingsbilder und 10.000 Testbilder unterteilt sind. Jedes Bild wurde auf eine Größe von 28x28 Pixel skaliert und in Graustufen umgewandelt.
+Der Datensatz umfasst 784 Merkmale, die jeweils einem Pixel des Bildes entsprechen, sowie ein zusätzliches Label, das die Kategorie des Kleidungsstücks angibt. Sowohl die Merkmale als auch die Labels sind als Integer-Werte gespeichert. Die Pixelwerte repräsentieren die Intensität des Grautons, während die Labels die verschiedenen Kleidungsstückkategorien darstellen.
+Insgesamt enthält der Datensatz 70.000 Beobachtungen. Die Parameter "Standort", Verteilungsparameter und Korrelationsanalyse sind für diesen Datensatz nicht anwendbar.
+Der Fashion-MNIST-Datensatz bietet eine moderne und herausfordernde Alternative zum klassischen MNIST-Datensatz, da er realistischere und komplexere Bilder von Kleidungsstücken enthält. Dies stellt eine größere Herausforderung für Bildklassifizierungsmodelle dar und bietet eine realistischere Anwendungsmöglichkeit im Bereich der maschinellen Bildverarbeitung.
 %% Cell type:markdown id: tags:
-## 2.1. Import of Relavant Modules
+### 2.1. Import von relevanten Modulen
 %% Cell type:code id: tags:
 ``` python
 pip install tensorflow-datasets
 ```
 %% Output
    Requirement already satisfied: tensorflow-datasets in c:\users\ar\anaconda3\lib\site-packages (4.9.6)
    Requirement already satisfied: absl-py in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (2.1.0)
    Requirement already satisfied: click in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (8.0.4)
    Requirement already satisfied: dm-tree in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (0.1.8)
    Requirement already satisfied: immutabledict in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (4.2.0)
    Requirement already satisfied: numpy in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (1.24.3)
    Requirement already satisfied: promise in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (2.3)
    Requirement already satisfied: protobuf>=3.20 in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (4.25.3)
    Requirement already satisfied: psutil in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (5.9.0)
    Requirement already satisfied: pyarrow in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (11.0.0)
    Requirement already satisfied: requests>=2.19.0 in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (2.31.0)
    Requirement already satisfied: simple-parsing in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (0.1.5)
    Requirement already satisfied: tensorflow-metadata in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (1.15.0)
    Requirement already satisfied: termcolor in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (2.4.0)
    Requirement already satisfied: toml in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (0.10.2)
    Requirement already satisfied: tqdm in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (4.65.0)
    Requirement already satisfied: wrapt in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-datasets) (1.14.1)
    Requirement already satisfied: etils>=1.9.1 in c:\users\ar\anaconda3\lib\site-packages (from etils[enp,epath,epy,etree]>=1.9.1; python_version >= "3.11"->tensorflow-datasets) (1.9.2)
    Requirement already satisfied: fsspec in c:\users\ar\anaconda3\lib\site-packages (from etils[enp,epath,epy,etree]>=1.9.1; python_version >= "3.11"->tensorflow-datasets) (2023.4.0)
    Requirement already satisfied: importlib_resources in c:\users\ar\anaconda3\lib\site-packages (from etils[enp,epath,epy,etree]>=1.9.1; python_version >= "3.11"->tensorflow-datasets) (6.4.0)
    Requirement already satisfied: typing_extensions in c:\users\ar\anaconda3\lib\site-packages (from etils[enp,epath,epy,etree]>=1.9.1; python_version >= "3.11"->tensorflow-datasets) (4.7.1)
    Requirement already satisfied: zipp in c:\users\ar\anaconda3\lib\site-packages (from etils[enp,epath,epy,etree]>=1.9.1; python_version >= "3.11"->tensorflow-datasets) (3.11.0)
    Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\ar\anaconda3\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (2.0.4)
    Requirement already satisfied: idna<4,>=2.5 in c:\users\ar\anaconda3\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (3.4)
    Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\ar\anaconda3\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (1.26.16)
    Requirement already satisfied: certifi>=2017.4.17 in c:\users\ar\anaconda3\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (2024.2.2)
    Requirement already satisfied: colorama in c:\users\ar\anaconda3\lib\site-packages (from click->tensorflow-datasets) (0.4.6)
    Requirement already satisfied: six in c:\users\ar\anaconda3\lib\site-packages (from promise->tensorflow-datasets) (1.16.0)
    Requirement already satisfied: docstring-parser~=0.15 in c:\users\ar\anaconda3\lib\site-packages (from simple-parsing->tensorflow-datasets) (0.16)
    Requirement already satisfied: googleapis-common-protos<2,>=1.56.4 in c:\users\ar\anaconda3\lib\site-packages (from tensorflow-metadata->tensorflow-datasets) (1.63.1)
    Note: you may need to restart the kernel to use updated packages.
 %% Cell type:code id: tags:
 ``` python
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 import tensorflow as tf
 import tensorflow_datasets as tfds
 from tensorflow.keras.models import Sequential
 from tensorflow.keras.layers import Flatten, Dense, Conv2D, MaxPooling2D, Dropout, Layer
 from tensorflow.keras.utils import to_categorical, plot_model
 from sklearn.model_selection import train_test_split
 from sklearn.metrics import classification_report
 ```
 %% Cell type:code id: tags:
 ``` python
 # Should be 2.5.0
 tf.__version__
 ```
 %% Output
    '2.16.1'
 %% Cell type:markdown id: tags:
-## 2.2 Read Data
+### 2.2 Daten einlesen
 %% Cell type:markdown id: tags:
-The training- and test-data is already labeled and split up into two datasets
+Die Trainings- und Testdaten sind bereits aufgeteilt in zwei verschiedene CSV Dateien
 %% Cell type:code id: tags:
 ``` python
 csv_file_train = "https://storage.googleapis.com/ml-service-repository-datastorage/Classification_of_clothing_through_images_fashion-mnist_train.csv"
 csv_file_test = "https://storage.googleapis.com/ml-service-repository-datastorage/Classification_of_clothing_through_images_fashion-mnist_test.csv"
 df_train = pd.read_csv(csv_file_train)
 df_test = pd.read_csv(csv_file_test)
 df_train.to_csv('df_train.csv', index=False)
 df_test.to_csv('df_test.csv', index=False)
 ```
 %% Cell type:markdown id: tags:
 ## 2.3. Data Analysis
 %% Cell type:code id: tags:
 ``` python
 df_train.head()
 ```
 %% Output
       label  pixel1  pixel2  pixel3  pixel4  pixel5  pixel6  pixel7  pixel8  \
    0      2       0       0       0       0       0       0       0       0
    1      9       0       0       0       0       0       0       0       0
    2      6       0       0       0       0       0       0       0       5
    3      0       0       0       0       1       2       0       0       0
    4      3       0       0       0       0       0       0       0       0
       pixel9  ...  pixel775  pixel776  pixel777  pixel778  pixel779  pixel780  \
    0       0  ...         0         0         0         0         0         0
    1       0  ...         0         0         0         0         0         0
    2       0  ...         0         0         0        30        43         0
    3       0  ...         3         0         0         0         0         1
    4       0  ...         0         0         0         0         0         0
       pixel781  pixel782  pixel783  pixel784
    0         0         0         0         0
    1         0         0         0         0
    2         0         0         0         0
    3         0         0         0         0
    4         0         0         0         0
    [5 rows x 785 columns]
 %% Cell type:markdown id: tags:
-Describe the dataframe, not really helpful in this case, but is shows that the data is not corrupted by evaluating:
+Das Beschreiben des Datenrahmens ist in diesem Fall nicht wirklich hilfreich, aber zeigt, dass die Daten nicht beschädigt sind, indem folgende Punkte überprüft werden:
- Label must be between 0 and 9
- Pixel values must be between 0 and 255 (non-negative)
+- Das Label muss zwischen 0 und 9 liegen.
- Count must be 60000 (train) 10000 (test)
+- Die Pixelwerte müssen zwischen 0 und 255 (nicht negativ) liegen.
- Maximum number of pixels must be 784 for all rows
+- Die Anzahl muss 60.000 (Training) und 10.000 (Test) betragen.
+- Die maximale Anzahl an Pixeln muss für alle Zeilen 784 betragen.
 %% Cell type:code id: tags:
 ``` python
 df_train.describe()
 ```
 %% Output
                  label        pixel1        pixel2        pixel3        pixel4  \
    count  60000.000000  60000.000000  60000.000000  60000.000000  60000.000000
    mean       4.500000      0.000900      0.006150      0.035333      0.101933
    std        2.872305      0.094689      0.271011      1.222324      2.452871
    min        0.000000      0.000000      0.000000      0.000000      0.000000
    25%        2.000000      0.000000      0.000000      0.000000      0.000000
    50%        4.500000      0.000000      0.000000      0.000000      0.000000
    75%        7.000000      0.000000      0.000000      0.000000      0.000000
    max        9.000000     16.000000     36.000000    226.000000    164.000000
                 pixel5        pixel6        pixel7        pixel8        pixel9  \
    count  60000.000000  60000.000000  60000.000000  60000.000000  60000.000000
    mean       0.247967      0.411467      0.805767      2.198283      5.682000
    std        4.306912      5.836188      8.215169     14.093378     23.819481
    min        0.000000      0.000000      0.000000      0.000000      0.000000
    25%        0.000000      0.000000      0.000000      0.000000      0.000000
    50%        0.000000      0.000000      0.000000      0.000000      0.000000
    75%        0.000000      0.000000      0.000000      0.000000      0.000000
    max      227.000000    230.000000    224.000000    255.000000    254.000000
           ...      pixel775      pixel776      pixel777      pixel778  \
    count  ...  60000.000000  60000.000000  60000.000000  60000.000000
    mean   ...     34.625400     23.300683     16.588267     17.869433
    std    ...     57.545242     48.854427     41.979611     43.966032
    min    ...      0.000000      0.000000      0.000000      0.000000
    25%    ...      0.000000      0.000000      0.000000      0.000000
    50%    ...      0.000000      0.000000      0.000000      0.000000
    75%    ...     58.000000      9.000000      0.000000      0.000000
    max    ...    255.000000    255.000000    255.000000    255.000000
               pixel779      pixel780      pixel781      pixel782      pixel783  \
    count  60000.000000  60000.000000  60000.000000  60000.000000  60000.000000
    mean      22.814817     17.911483      8.520633      2.753300      0.855517
    std       51.830477     45.149388     29.614859     17.397652      9.356960
    min        0.000000      0.000000      0.000000      0.000000      0.000000
    25%        0.000000      0.000000      0.000000      0.000000      0.000000
    50%        0.000000      0.000000      0.000000      0.000000      0.000000
    75%        0.000000      0.000000      0.000000      0.000000      0.000000
    max      255.000000    255.000000    255.000000    255.000000    255.000000
              pixel784
    count  60000.00000
    mean       0.07025
    std        2.12587
    min        0.00000
    25%        0.00000
    50%        0.00000
    75%        0.00000
    max      170.00000
    [8 rows x 785 columns]
 %% Cell type:code id: tags:
 ``` python
 df_test.describe()
 ```
 %% Output
                  label        pixel1        pixel2        pixel3        pixel4  \
    count  10000.000000  10000.000000  10000.000000  10000.000000  10000.000000
    mean       4.500000      0.000400      0.010300      0.052100      0.077000
    std        2.872425      0.024493      0.525187      2.494315      2.208882
    min        0.000000      0.000000      0.000000      0.000000      0.000000
    25%        2.000000      0.000000      0.000000      0.000000      0.000000
    50%        4.500000      0.000000      0.000000      0.000000      0.000000
    75%        7.000000      0.000000      0.000000      0.000000      0.000000
    max        9.000000      2.000000     45.000000    218.000000    185.000000
                 pixel5        pixel6        pixel7        pixel8        pixel9  \
    count  10000.000000  10000.000000  10000.000000  10000.000000  10000.000000
    mean       0.208600      0.349200      0.826700      2.321200      5.457800
    std        4.669183      5.657849      8.591731     15.031508     23.359019
    min        0.000000      0.000000      0.000000      0.000000      0.000000
    25%        0.000000      0.000000      0.000000      0.000000      0.000000
    50%        0.000000      0.000000      0.000000      0.000000      0.000000
    75%        0.000000      0.000000      0.000000      0.000000      0.000000
    max      227.000000    223.000000    247.000000    218.000000    244.000000
           ...      pixel775      pixel776      pixel777      pixel778  \
    count  ...  10000.000000  10000.000000  10000.000000  10000.000000
    mean   ...     34.320800     23.071900     16.432000     17.870600
    std    ...     57.888679     49.049749     42.159665     44.140552
    min    ...      0.000000      0.000000      0.000000      0.000000
    25%    ...      0.000000      0.000000      0.000000      0.000000
    50%    ...      0.000000      0.000000      0.000000      0.000000
    75%    ...     55.000000      6.000000      0.000000      0.000000
    max    ...    254.000000    252.000000    255.000000    255.000000
               pixel779      pixel780      pixel781      pixel782      pixel783  \
    count  10000.000000  10000.000000  10000.000000  10000.000000  10000.000000
    mean      22.860000     17.790200      8.353500      2.541600      0.629500
    std       51.706601     45.128107     28.765769     16.417363      7.462533
    min        0.000000      0.000000      0.000000      0.000000      0.000000
    25%        0.000000      0.000000      0.000000      0.000000      0.000000
    50%        0.000000      0.000000      0.000000      0.000000      0.000000
    75%        1.000000      0.000000      0.000000      0.000000      0.000000
    max      255.000000    255.000000    240.000000    225.000000    205.000000
              pixel784
    count  10000.00000
    mean       0.06560
    std        1.93403
    min        0.00000
    25%        0.00000
    50%        0.00000
    75%        0.00000
    max      107.00000
    [8 rows x 785 columns]
 %% Cell type:markdown id: tags:
-Both test data and train data seems to be valid and uncorrupted
+Sowohl die Testdaten als auch die Trainingsdaten scheinen gültig und unbeschädigt zu sein.
 %% Cell type:markdown id: tags:
-Define human readable names for the 10 categories
+Für Menschen lesbare Namen für 10 Kategorien definieren.
 %% Cell type:code id: tags:
 ``` python
 class_names = ['Top','Trouser','Pullover','Dress','Coat',
               'Sandal','Shirt','Sneaker','Bag','Ankle boot']
 ```
 %% Cell type:markdown id: tags:
-Split the dataset an check the distribution of each class (split could be also done later)
+Teile den Datensatz und überprüfe die Verteilung jeder Klasse (das Teilen kann auch später erfolgen).
 %% Cell type:code id: tags:
 ``` python
 df_train, df_val = train_test_split(df_train, test_size=0.1, random_state=365)
 print(f"{len(df_train)} train examples")
 print(f"{len(df_val)} validation examples")
 print(f"{len(df_test)} test examples")
 ```
 %% Output
    54000 train examples
    6000 validation examples
    10000 test examples
 %% Cell type:markdown id: tags:
-Show the distribution for each set
+Zeige die Verteilung für jedes Datenset
 %% Cell type:code id: tags:
 ``` python
 def get_classes_distribution(data):
    # Get the count for each label
    label_counts = data["label"].value_counts()
    # Get total number of samples
    total_samples = len(data)
    # Count the number of items in each class
    for i in range(len(label_counts)):
        label = class_names[label_counts.index[i]]
        count = label_counts.values[i]
        percent = (count / total_samples) * 100
        print("{:<20s}:   {} or {}%".format(label, count, percent))
 print("\nTRAIN DISTRIBUTION\n")
 get_classes_distribution(df_train)
 print("\nVALIDATION DISTRIBUTION\n")
 get_classes_distribution(df_val)
 print("\nTEST DISTRIBUTION\n")
 get_classes_distribution(df_test)
 ```
 %% Output
    TRAIN DISTRIBUTION
    Sandal              :   5429 or 10.053703703703704%
    Coat                :   5421 or 10.03888888888889%
    Pullover            :   5407 or 10.012962962962963%
    Dress               :   5405 or 10.00925925925926%
    Ankle boot          :   5404 or 10.007407407407408%
    Shirt               :   5397 or 9.994444444444445%
    Top                 :   5396 or 9.992592592592594%
    Sneaker             :   5395 or 9.99074074074074%
    Trouser             :   5384 or 9.97037037037037%
    Bag                 :   5362 or 9.92962962962963%
    VALIDATION DISTRIBUTION
    Bag                 :   638 or 10.633333333333333%
    Trouser             :   616 or 10.266666666666667%
    Sneaker             :   605 or 10.083333333333332%
    Top                 :   604 or 10.066666666666666%
    Shirt               :   603 or 10.05%
    Ankle boot          :   596 or 9.933333333333334%
    Dress               :   595 or 9.916666666666666%
    Pullover            :   593 or 9.883333333333333%
    Coat                :   579 or 9.65%
    Sandal              :   571 or 9.516666666666666%
    TEST DISTRIBUTION
    Top                 :   1000 or 10.0%
    Trouser             :   1000 or 10.0%
    Pullover            :   1000 or 10.0%
    Dress               :   1000 or 10.0%
    Bag                 :   1000 or 10.0%
    Shirt               :   1000 or 10.0%
    Sandal              :   1000 or 10.0%
    Coat                :   1000 or 10.0%
    Sneaker             :   1000 or 10.0%
    Ankle boot          :   1000 or 10.0%
 %% Cell type:markdown id: tags:
-This is already helpful, we can see that it is quite evenly split.
+Das ist bereits hilfreich, wir können sehen, dass die Verteilung ziemlich gleichmäßig ist.
-Print the data as a pie chart, to make it even nicer.
+Drucke die Daten als Tortendiagramm aus, um es noch ansprechender zu gestalten.
 %% Cell type:code id: tags:
 ``` python
 def func(pct, allvalues):
    absolute = int(pct / 100.*np.sum(allvalues))
    return "{:.1f}%\n({:d})".format(pct, absolute)
 def plot_pie(title, data):
    # Creating plot
    fig, ax = plt.subplots(figsize =(10, 7))
    plt.pie(data, autopct = lambda pct: func(pct, data), labels = class_names)
    ax.set_title(title)
    # show plot
    plt.show()
 plot_pie("Train data distribution", df_train["label"].value_counts())
 plot_pie("Validation data distribution", df_val["label"].value_counts())
 plot_pie("Test data distribution", df_test["label"].value_counts())
 ```
 %% Output
 %% Cell type:markdown id: tags:
-Print a single image
+Ein einziges Bild ausgeben
 %% Cell type:code id: tags:
 ``` python
 # Make copies of the data to allow easy exploration
 df_train_exp_copy = df_train.copy()
 y_train_exp = df_train_exp_copy.pop('label').to_numpy()
 x_train_exp = df_train_exp_copy.to_numpy()
 # Take a single image, and remove the color dimension by reshaping
 image = x_train_exp[0].reshape((28,28)) / 255.0
 plt.figure()
 plt.imshow(image, cmap=plt.cm.binary)
 plt.colorbar()
 plt.grid(False)
 plt.show()
 ```
 %% Output
 %% Cell type:markdown id: tags:
-Print one image from each category to see how they look like and how they differ
+Drucken Sie jeweils ein Bild aus jeder Kategorie, um zu sehen, wie sie aussehen und wie sie sich unterscheiden.
 %% Cell type:code id: tags:
 ``` python
 plt.figure(figsize=(10,10))
 i = 0
 for index in range(len(x_train_exp)):
    label = y_train_exp[index]
    image = x_train_exp[index] / 255.0
    if label == i:
        image = image.reshape((28,28))
        plt.subplot(5,5,i+1)
        plt.xticks([])
        plt.yticks([])
        plt.grid(False)
        plt.imshow(image, cmap=plt.cm.binary)
        plt.title(class_names[label])
        i += 1
    if i == 10:
        break
 plt.show()
 ```
 %% Output
 %% Cell type:markdown id: tags:
-# 3. Data preperation
+## 3. Datenvorbereitung
+%% Cell type:markdown id: tags:
+Die Data Preparation Phase beginnt mit der Aufteilung der Daten in Trainings-, Validierungs- und Testsets sowie der Skalierung der Merkmale. Anschließend werden die Bildformate von 784 auf 28x28 umgewandelt (falls als CSV mit 784 Spalten geladen) und die Labels in ein kategorisches Format konvertiert. Die Überprüfung der Datenformate gewährleistet, dass alle Daten im korrekten Format vorliegen.
 %% Cell type:markdown id: tags:
-## 3.1. Test and Train Data
+### 3.1. Test- und Trainingsdaten
 %% Cell type:code id: tags:
 ``` python
 df_train_copy = df_train.copy()
 y_train = df_train_copy.pop('label').to_numpy()
 x_train = df_train_copy.to_numpy()
 df_val_copy = df_val.copy()
 y_val = df_val_copy.pop('label').to_numpy()
 x_val = df_val_copy.to_numpy()
 df_test_copy = df_test.copy()
 y_test = df_test_copy.pop('label').to_numpy()
 x_test = df_test_copy.to_numpy()
 ```
 %% Cell type:markdown id: tags:
-## 3.2. Feature Scaling
+### 3.2. Merkmalsskalierung
 %% Cell type:code id: tags:
 ``` python
 x_train = x_train / 255.0
 x_val = x_val / 255.0
 x_test = x_test / 255.0
 ```
 %% Cell type:markdown id: tags:
-Convert the image shape from 784 to 28x28 (only if load as CSV with 784 columns)
+Wandeln Sie die Bildform von 784 auf 28x28 um (nur wenn sie als CSV mit 784 Spalten geladen wurden).
 %% Cell type:code id: tags:
 ``` python
 IMG_ROWS = 28
 IMG_COLS = 28
 IMAGE_SHAPE = (IMG_ROWS, IMG_COLS, 1)
 x_train = x_train.reshape(x_train.shape[0], *IMAGE_SHAPE)
 x_val = x_val.reshape(x_val.shape[0], *IMAGE_SHAPE)
 x_test = x_test.reshape(x_test.shape[0], *IMAGE_SHAPE)
 ```
 %% Cell type:markdown id: tags:
-## 3.3. Convert Labels
+### 3.3. Labels konvertieren
 %% Cell type:code id: tags:
 ``` python
 y_train = to_categorical(y_train, 10)
 y_val = to_categorical(y_val, 10)
 y_test = to_categorical(y_test, 10)
 ```
 %% Cell type:markdown id: tags:
-Check the data shapes to get ensure that the data is in the correct format
+Überprüfen Sie die Datenformate, um sicherzustellen, dass die Daten im richtigen Format vorliegen.
 %% Cell type:code id: tags:
 ``` python
 print(x_train.shape)
 print(y_train.shape)
 print(x_val.shape)
 print(y_val.shape)
 print(x_test.shape)
 print(y_test.shape)
 ```
 %% Output
    (54000, 28, 28, 1)
    (54000, 10)
    (6000, 28, 28, 1)
    (6000, 10)
    (10000, 28, 28, 1)
    (10000, 10)
 %% Cell type:markdown id: tags:
-# 4. Modelling and Evaluation
+# 4. Datenmodell
-%% Cell type:markdown id: tags:
+%% Cell type:markdown id: tags:datenmodell
-Define how the model will look like. Below some descriptions for different Layer types.
+In der Modellierungsphase wird die Architektur des Modells definiert, die verschiedene Arten von Schichten umfasst. Zunächst wurde ein einfaches DNN mit dichten Schichten ausprobiert, jedoch konnten verbesserte Ergebnisse erzielt werden, indem auf eine CNN-Architektur umgestellt wurde. Als Ausgangsarchitektur wurde die Implementierung von LeNet-5 gewählt und angepasst. Die Hyperparameter wurden mithilfe eines Keras-Optimierers optimiert, der verschiedene Kombinationen ausprobierte, um die aktuellen Parameter auszuwählen. Das Modell umfasst Schichten wie Dense für die Berechnung des Skalarprodukts, Dropout zur Vermeidung von Überanpassung, Flatten zum Flachlegen von Matrizen, MaxPooling2D zur Reduzierung der Eingabedimensionen und Conv2D für Faltungsoperationen auf den Eingabedaten.
-The first trial of this model was just a DNN with simple dense layers, but the results can be improved by using a CNN.
-As a start architecture the LeNet-5 implementation was chosen and then altered.
-The hyperparameters could also be optimized with a Keras Optimizer which tries out several defined combinations.
-The current parameters got chosen by exploration.
- Dense: receives all inputs from previous layer, creates dot product
- Dropout layer: removes noise for overfitting, drops at specific rate
- Reshape layer: changes the shape of the input, not used
- Permute layer: alter shape of the input, not used
- ReapeatVector layer: repeats the input for fixed number of times, not used
- Flatten Layer: flattens the matrix
- MaxPooling2D Layer: reduces number of input
- Conv2D Layer: convolves an input
 %% Cell type:code id: tags:
 ``` python
 model = Sequential()
 model.add(Conv2D(filters=32,kernel_size=3, activation='relu', padding='same', input_shape=(28, 28,1)))
 model.add(Conv2D(filters=32,kernel_size=3,padding='same', activation='relu'))
 model.add(MaxPooling2D(pool_size=(2, 2)))
 model.add(Flatten())
 model.add(Dropout(0.40))
 model.add(Dense(units=128, activation='relu'))
 model.add(Dense(units=10, activation='softmax'))
 model.summary()
 ```
 %% Output
    C:\Users\ar\anaconda3\Lib\site-packages\keras\src\layers\convolutional\base_conv.py:107: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
      super().__init__(activity_regularizer=activity_regularizer, **kwargs)
 %% Cell type:markdown id: tags:
-Compile the model
+Modell kompilieren
 %% Cell type:code id: tags:
 ``` python
 model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss= tf.keras.losses.categorical_crossentropy,
    metrics=['accuracy']
 )
 ```
 %% Cell type:markdown id: tags:
-Train/Fit the model
+Trainieren und fitten des Modells
 %% Cell type:code id: tags:
 ``` python
 # Determine the maximum number of epochs
 NUM_EPOCHS = 10
 BATCH_SIZE = 64
 # Fit the model,
 # specify the training data
 # the total number of epochs
 # and the validation data we just created
 history = model.fit(
    x_train,
    y_train,
    batch_size=BATCH_SIZE,
    epochs=NUM_EPOCHS,
    validation_data=(x_val, y_val),
    validation_steps=10,
    verbose =2
 )
 ```
 %% Output
    Epoch 1/10
    844/844 - 9s - 10ms/step - accuracy: 0.8485 - loss: 0.4227 - val_accuracy: 0.9016 - val_loss: 0.2939
    Epoch 2/10
    844/844 - 7s - 9ms/step - accuracy: 0.8973 - loss: 0.2795 - val_accuracy: 0.9141 - val_loss: 0.2329
    Epoch 3/10
    844/844 - 7s - 9ms/step - accuracy: 0.9124 - loss: 0.2361 - val_accuracy: 0.9312 - val_loss: 0.2121
    Epoch 4/10
    844/844 - 9s - 10ms/step - accuracy: 0.9236 - loss: 0.2059 - val_accuracy: 0.9156 - val_loss: 0.2440
    Epoch 5/10
    844/844 - 9s - 10ms/step - accuracy: 0.9330 - loss: 0.1798 - val_accuracy: 0.9234 - val_loss: 0.1941
    Epoch 6/10
    844/844 - 9s - 10ms/step - accuracy: 0.9403 - loss: 0.1591 - val_accuracy: 0.9312 - val_loss: 0.1952
    Epoch 7/10
    844/844 - 9s - 11ms/step - accuracy: 0.9474 - loss: 0.1394 - val_accuracy: 0.9234 - val_loss: 0.1963
    Epoch 8/10
    844/844 - 9s - 11ms/step - accuracy: 0.9535 - loss: 0.1258 - val_accuracy: 0.9266 - val_loss: 0.2069
    Epoch 9/10
    844/844 - 10s - 12ms/step - accuracy: 0.9601 - loss: 0.1071 - val_accuracy: 0.9312 - val_loss: 0.2307
    Epoch 10/10
    844/844 - 9s - 11ms/step - accuracy: 0.9645 - loss: 0.0956 - val_accuracy: 0.9458 - val_loss: 0.2156
    C:\Users\ar\anaconda3\Lib\contextlib.py:155: UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset.
      self.gen.throw(typ, value, traceback)
 %% Cell type:markdown id: tags:
-We want do know how especially the loss changes over time
+Wir wollen besonders wissen wie sich der Loss (Verlust) über die Zeit entwickelt
 %% Cell type:code id: tags:
 ``` python
 acc = history.history["accuracy"]
 val_acc = history.history["val_accuracy"]
 loss = history.history["loss"]
 val_loss = history.history["val_loss"]
 epochs = range(len(acc))
 plt.plot(epochs, acc, "darkgreen", label="Training accuracy")
 plt.plot(epochs, val_acc, "darkblue", label="Validation accuracy")
 plt.plot(epochs, loss, "lightgreen", label="Training loss")
 plt.plot(epochs, val_loss, "lightblue", label="Validation loss")
 plt.title("Training and validation accuracy")
 plt.xlabel("Epochs")
 plt.ylabel("Precent/100")
 plt.legend(loc=0)
 plt.figure()
 plt.show()
 ```
 %% Output
 %% Cell type:markdown id: tags:
-We can see that the model is performing quite well, but after the second epoch it starts to overfit. To prevent that we could try with different train-validation splits, add more dropout or restructure parts of the model.
+Es ist zu erkennen, dass das Modell recht gut funktioniert, aber nach der zweiten Epoche beginnt es zu überanpassen. Um dies zu verhindern, könnten wir verschiedene Trainings-Validierungs-Splits ausprobieren, mehr Dropouts hinzufügen oder Teile des Modells umstrukturieren.
 %% Cell type:markdown id: tags:
-Evaluate the test-data
+Testdaten evaluieren
 %% Cell type:code id: tags:
 ``` python
 test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
 ```
 %% Cell type:code id: tags:
 ``` python
 print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))
 ```
 %% Output
    Test loss: 0.22. Test accuracy: 93.17%
 %% Cell type:markdown id: tags:
-The model performs quite well with an accuracy of > 90% on the test-data. The loss is acceptable.
+Das Modell schneidet mit einer Genauigkeit von > 90 % bei den Testdaten recht gut ab. Der Verlust ist akzeptabel.
 %% Cell type:markdown id: tags:
-Show results by class
+Zeige Resultate nach Klasse an
 %% Cell type:code id: tags:
 ``` python
 predicted_classes = (model.predict(x_test) > 0.5).astype("int32")
 ```
 %% Output
    [1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step
 %% Cell type:code id: tags:
 ``` python
 print(classification_report(y_test, predicted_classes, target_names=class_names))
 ```
 %% Output
                  precision    recall  f1-score   support
             Top       0.91      0.85      0.88      1000
         Trouser       0.99      0.99      0.99      1000
        Pullover       0.89      0.89      0.89      1000
           Dress       0.92      0.97      0.94      1000
            Coat       0.93      0.86      0.89      1000
          Sandal       0.99      0.98      0.99      1000
           Shirt       0.81      0.82      0.81      1000
         Sneaker       0.96      0.98      0.97      1000
             Bag       0.98      0.99      0.99      1000
      Ankle boot       0.98      0.96      0.97      1000
       micro avg       0.94      0.93      0.93     10000
       macro avg       0.94      0.93      0.93     10000
    weighted avg       0.94      0.93      0.93     10000
     samples avg       0.93      0.93      0.93     10000
    C:\Users\ar\anaconda3\Lib\site-packages\sklearn\metrics\_classification.py:1517: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
      _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
+%% Cell type:markdown id: tags:
+## 5. Evaluation
+%% Cell type:markdown id: tags:evaluation
+Nach dem Training und der Validierung des Modells wird die Leistung anhand der Testdaten ausgewertet. Dabei erreicht das Modell eine Testgenauigkeit von über 90 %, was darauf hinweist, dass es gut in der Lage ist, neue, bisher ungesehene Daten zu klassifizieren. Der Testverlust ist ebenfalls akzeptabel niedrig, was darauf hinweist, dass das Modell Vorhersagen nahe an den tatsächlichen Daten trifft. Zusätzlich zeigen klassenspezifische Metriken wie Präzision, Rückruf und F1-Score für jede Klasse (z.B. Top, Trouser, Pullover usw.), dass das Modell gute bis sehr gute Ergebnisse erzielt, insbesondere bei Klassen wie Sneaker, Bag und Ankle boot. Diese Ergebnisse bieten eine umfassende Bewertung der Modellleistung und helfen dabei, seine Eignung für praktische Anwendungen zu beurteilen.
+%% Cell type:markdown id: tags:
+## 6. Umsetzung
+%% Cell type:markdown id: tags:umsetzung
+Im Rahmen des CRISP-DM Zyklus stellt das Deployment den letzten Schritt dar, bei dem das trainierte Modell für den produktiven Einsatz vorbereitet wird. Dies beinhaltet die Implementierung des Modells in eine Produktionsumgebung, wo es Echtzeitdaten verarbeiten kann. Vor dem Deployment müssen alle Aspekte wie Modellperformance auf Testdaten, Sicherstellung der Skalierbarkeit und Integration in bestehende Systeme sorgfältig überprüft werden. Zudem ist es entscheidend, fortlaufende Überwachung und Wartung sicherzustellen, um die langfristige Leistung und Genauigkeit des Modells zu gewährleisten.