Snippets Groups Projects

Something went wrong on our end

3 years ago

Initial setup of the repo · b37d50f3

Alexander Roßmann authored 3 years ago

The repo was initially set up with prepared ML services. This also includes the description of the repo in a README, style guides and requirements

b37d50f3

Initial setup of the repo

Alexander Roßmann authored 3 years ago

The repo was initially set up with prepared ML services. This also includes the description of the repo in a README, style guides and requirements

Code owners

Assign users and groups as approvers for specific file changes. Learn more.

style_Guide.md 1.89 KiB

Style Guide

for this Repository

All notebooks shall adhere to this guide.

File structure:

├─── Name of ML use case  
    +-- data.csv  
    +-- notebook.ipynb  
    \-- README.md

The names of the files shall be as shown above.
If there are multiple data files you can use a data folder.
Make sure you adjust your filepath stings in your notebook!

For notebooks

name of notebook is notebook.ipynb
all import Statements on Top of the the file
everything is in english: Variables, comments, figure title, figure axis, and Markdown text

adhere to PEP 8

https://www.youtube.com/watch?v=D4_s3q038I0&t=1182s
! use automatic formatting with something like autopep8 or black
howto for jupyter notebook

notebook structure

a notebook should have this structure

Business Understanding
Data and Data Understanding
2.1. Import of Relevant Modules
2.2. Read Data
2.3. Data Cleaning
2.4. Descriptive Analytics
2.4.1. Continous Features
2.4.2. Categorical Features
Data Preparation
3.1. Reduce Customer ID
3.2. Recoding of Categorical Variables
3.3. Test for Multicollinearity
3.4. Feature Scaling
3.5. Undersampling
3.6. Create Test and Training Data
Modelling and Evaluation
4.1. Logistic Regression
4.2 Evaluation
4.3. Interpretation
4.4. Model Optimization
Deployment

Variable names

use snake_case always no camelCase !!

Variables:

data_raw = pd.read_csv("data.csv")
data_cleaned
data_1
data_2
data_3
y = data["Target Variable"]
x = data.drop(["TargetVariable"])
x = scaler.fit_transform(x) X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, random_state=110)
model_lin_regression = LinearRegression()
model_lin_regression.fit(x_train,y_train)
model_nearest_neighbors = NearestNeighbors().fit(x_train)