이대로만 하면 2유형 40점 만점!
https://www.kaggle.com/code/agileteam/t2-4-house-prices-regression
T2-4. House Prices (Regression)
Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources
www.kaggle.com
#X_train, y_train, X_test(제출용 예측 데이터)
#EDA
# print(X_train.info())
# print(X_test.info())
# print(X_train.head())
# print(X_test.head())
# print(y_train.info())
# print(y_train.head())
'''
독립변수에서 제거할 컬럼: Alley, FireplaceQu, PoolQC, Fence, MiscFeature
종속변수: SalePrice (Id제거)
결측치 처리(X_train, X_test): LotFrontage, MasVnrType, MasVnrArea, BsmtQual, BsmtCond, BsmtExposure, BsmtFinType1,
BsmtFinType2, Electrical, GarageType, GarageYrBlt, GarageFinish, GarageQual, GarageCond,
레이블 인코딩:
'''
# 데이터 전처리 - 결측치 처리
for column in X_train.columns:
mode_value = X_train[column].mode()[0]
X_train[column].fillna(mode_value, inplace = True)
for column in X_test.columns:
mode_value = X_test[column].mode()[0]
X_test[column].fillna(mode_value, inplace = True)
# 데이터 전처리 - 레이블 인코딩
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
for column in X_train.select_dtypes(include=['object']).columns:
X_train[column] = le.fit_transform(X_train[column])
for column in X_test.select_dtypes(include = ['object']).columns:
X_test[column] = le.fit_transform(X_test[column])
# print(X_train.info())
# print(X_test.info())
# print(y_train.info())
# 데이터 분할
from sklearn.model_selection import train_test_split
X = X_train.drop(columns = ['Alley', 'FireplaceQu', 'PoolQC', 'Fence', 'MiscFeature'])
y = y_train['SalePrice']
x_train, x_test, Y_train, Y_test = train_test_split(X, y, test_size = 0.2, random_state = 11)
# 모델링, 학습, 예측
from sklearn.ensemble import RandomForestRegressor
rfr = RandomForestRegressor(n_estimators = 30, max_depth = 10, random_state = 11)
rfr.fit(x_train, Y_train)
pred1 = rfr.predict(x_test)
# pred1 성능 평가
import numpy as np
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(Y_test, pred1)
rmse = np.sqrt(mse)
print(rmse)
# 테스트데이터로 예측
test_X = X_test.drop(columns = ['Alley', 'FireplaceQu', 'PoolQC', 'Fence', 'MiscFeature'])
pred2 = rfr.predict(test_X)
#결과 데이터 생성 및 확인
pd.DataFrame({'pred': pred2}).to_csv('result.csv', index = False)
result = pd.read_csv('result.csv')
print(result)
데이터 전처리 반복문으로 한번에 !
1. 결측치 처리
for column in df.columns:
mode_value = df[column].mode()[0]
df[column].fillna(mode_value, inplace = True)
2. 레이블 인코딩
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
for column in df.select_dtypes(include = [‘object’]).columns:
df[column] = le.fit_transform(df[column])
'빅데이터분석기사_실기 > 제2유형: 데이터 분석' 카테고리의 다른 글
제 2유형 [회귀 Regression] 자전거 수요 예측 (0) | 2024.07.16 |
---|---|
제 2유형 [회귀 Regression] Insurance (0) | 2024.07.16 |
제 2유형 [분류 Classification] 성인 인구소득 (범주형) (0) | 2024.07.16 |
제 2유형 [분류 Classification] 당뇨병 (0) | 2024.07.16 |
제 2유형 [분류 Classification] 타이타닉 (0) | 2024.07.16 |