TensorFlow 8

神经网络简介

  • 使用 TensorFlow DNNRegressor 类定义神经网络 (NN) 及其隐藏层
  • 训练神经网络学习数据集中的非线性规律,并实现比线性回归模型更好的效果

在之前的练习中,我们使用合成特征来帮助模型学习非线性规律。

一组重要的非线性关系是纬度和经度的关系,但也可能存在其他非线性关系。

现在我们从之前练习中的逻辑回归任务回到标准的(线性)回归任务。也就是说,我们将直接预测 median_house_value

设置

加载数据并创建特征定义。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from __future__ import print_function

import math

from IPython import display
from matplotlib import cm
from matplotlib import gridspec
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset

tf.logging.set_verbosity(tf.logging.ERROR)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

california_housing_dataframe = pd.read_csv("https://download.mlcc.google.cn/mledu-datasets/california_housing_train.csv", sep=",")

california_housing_dataframe = california_housing_dataframe.reindex(
np.random.permutation(california_housing_dataframe.index))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def preprocess_features(california_housing_dataframe):
"""Prepares input features from California housing data set.

Args:
california_housing_dataframe: A Pandas DataFrame expected to contain data
from the California housing data set.
Returns:
A DataFrame that contains the features to be used for the model, including
synthetic features.
"""
selected_features = california_housing_dataframe[
["latitude",
"longitude",
"housing_median_age",
"total_rooms",
"total_bedrooms",
"population",
"households",
"median_income"]]
processed_features = selected_features.copy()
# Create a synthetic feature.
processed_features["rooms_per_person"] = (
california_housing_dataframe["total_rooms"] /
california_housing_dataframe["population"])
return processed_features

def preprocess_targets(california_housing_dataframe):
"""Prepares target features (i.e., labels) from California housing data set.

Args:
california_housing_dataframe: A Pandas DataFrame expected to contain data
from the California housing data set.
Returns:
A DataFrame that contains the target feature.
"""
output_targets = pd.DataFrame()
# Scale the target to be in units of thousands of dollars.
output_targets["median_house_value"] = (
california_housing_dataframe["median_house_value"] / 1000.0)
return output_targets
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Choose the first 12000 (out of 17000) examples for training.
training_examples = preprocess_features(california_housing_dataframe.head(12000))
training_targets = preprocess_targets(california_housing_dataframe.head(12000))

# Choose the last 5000 (out of 17000) examples for validation.
validation_examples = preprocess_features(california_housing_dataframe.tail(5000))
validation_targets = preprocess_targets(california_housing_dataframe.tail(5000))

# Double-check that we've done the right thing.
print("Training examples summary:")
display.display(training_examples.describe())
print("Validation examples summary:")
display.display(validation_examples.describe())

print("Training targets summary:")
display.display(training_targets.describe())
print("Validation targets summary:")
display.display(validation_targets.describe())
Training examples summary:
latitude longitude housing_median_age total_rooms total_bedrooms population households median_income rooms_per_person
count 12000.0 12000.0 12000.0 12000.0 12000.0 12000.0 12000.0 12000.0 12000.0
mean 35.6 -119.6 28.7 2636.9 537.9 1429.2 500.3 3.9 2.0
std 2.1 2.0 12.5 2187.4 422.5 1168.5 387.3 1.9 1.1
min 32.5 -124.3 1.0 2.0 1.0 3.0 1.0 0.5 0.0
25% 33.9 -121.8 18.0 1461.0 296.8 788.0 281.0 2.6 1.5
50% 34.2 -118.5 29.0 2116.0 431.0 1165.0 407.5 3.5 1.9
75% 37.7 -118.0 37.0 3127.0 645.0 1713.0 601.0 4.8 2.3
max 42.0 -114.3 52.0 37937.0 6445.0 35682.0 6082.0 15.0 55.2
Validation examples summary:
latitude longitude housing_median_age total_rooms total_bedrooms population households median_income rooms_per_person
count 5000.0 5000.0 5000.0 5000.0 5000.0 5000.0 5000.0 5000.0 5000.0
mean 35.6 -119.6 28.4 2659.9 542.9 1430.4 503.5 3.9 2.0
std 2.1 2.0 12.9 2162.0 419.1 1096.8 377.9 1.9 1.3
min 32.6 -124.3 2.0 15.0 4.0 8.0 2.0 0.5 0.1
25% 33.9 -121.8 18.0 1465.8 297.0 793.0 283.0 2.6 1.5
50% 34.2 -118.5 28.0 2154.5 439.0 1173.0 413.0 3.5 1.9
75% 37.7 -118.0 37.0 3216.0 658.2 1738.0 614.0 4.7 2.3
max 41.9 -114.6 52.0 30401.0 4957.0 13251.0 4339.0 15.0 52.0
Training targets summary:
median_house_value
count 12000.0
mean 206.6
std 115.5
min 15.0
25% 119.2
50% 180.8
75% 263.3
max 500.0
Validation targets summary:
median_house_value
count 5000.0
mean 208.9
std 117.0
min 15.0
25% 120.2
50% 179.2
75% 268.3
max 500.0

构建神经网络

神经网络由 DNNRegressor 类定义。

使用 hidden_units 定义神经网络的结构。hidden_units 参数会创建一个整数列表,其中每个整数对应一个隐藏层,表示其中的节点数。以下面的赋值为例:

hidden_units=[3,10]

上述赋值为神经网络指定了两个隐藏层:

  • 第一个隐藏层包含 3 个节点。
  • 第二个隐藏层包含 10 个节点。

如果我们想要添加更多层,可以向该列表添加更多整数。例如,hidden_units=[10,20,30,40] 会创建 4 个分别包含 10、20、30 和 40 个单元的隐藏层。

默认情况下,所有隐藏层都会使用 ReLu 激活函数,且是全连接层。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def construct_feature_columns(input_features):
"""Construct the TensorFlow Feature Columns.

Args:
input_features: The names of the numerical input features to use.
Returns:
A set of feature columns
"""
return set([tf.feature_column.numeric_column(my_feature)
for my_feature in input_features])

def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):
"""Trains a neural net regression model.

Args:
features: pandas DataFrame of features
targets: pandas DataFrame of targets
batch_size: Size of batches to be passed to the model
shuffle: True or False. Whether to shuffle the data.
num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely
Returns:
Tuple of (features, labels) for next data batch
"""

# Convert pandas data into a dict of np arrays.
features = {key:np.array(value) for key,value in dict(features).items()}

# Construct a dataset, and configure batching/repeating.
ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit
ds = ds.batch(batch_size).repeat(num_epochs)

# Shuffle the data, if specified.
if shuffle:
ds = ds.shuffle(10000)

# Return the next batch of data.
features, labels = ds.make_one_shot_iterator().get_next()
return features, labels
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
def train_nn_regression_model(
learning_rate,
steps,
batch_size,
hidden_units,
training_examples,
training_targets,
validation_examples,
validation_targets):
"""Trains a neural network regression model.

In addition to training, this function also prints training progress information,
as well as a plot of the training and validation loss over time.

Args:
learning_rate: A `float`, the learning rate.
steps: A non-zero `int`, the total number of training steps. A training step
consists of a forward and backward pass using a single batch.
batch_size: A non-zero `int`, the batch size.
hidden_units: A `list` of int values, specifying the number of neurons in each layer.
training_examples: A `DataFrame` containing one or more columns from
`california_housing_dataframe` to use as input features for training.
training_targets: A `DataFrame` containing exactly one column from
`california_housing_dataframe` to use as target for training.
validation_examples: A `DataFrame` containing one or more columns from
`california_housing_dataframe` to use as input features for validation.
validation_targets: A `DataFrame` containing exactly one column from
`california_housing_dataframe` to use as target for validation.

Returns:
A `DNNRegressor` object trained on the training data.
"""

periods = 10
steps_per_period = steps / periods

# Create a DNNRrgressor object.
my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)
dnn_regressor = tf.estimator.DNNRegressor(
feature_columns=construct_feature_columns(training_examples),
hidden_units=hidden_units,
optimizer=my_optimizer
)

# Create input functions.
training_input_fn = lambda: my_input_fn(training_examples,
training_targets["median_house_value"],
batch_size=batch_size)
predict_training_input_fn = lambda: my_input_fn(training_examples,
training_targets["median_house_value"],
num_epochs=1,
shuffle=False)
predict_validation_input_fn = lambda: my_input_fn(validation_examples,
validation_targets["median_house_value"],
num_epochs=1,
shuffle=False)
# Train the model, but do so inside a loop so that we can periodically assess
# loss metrics.
print("Training model...")
print("RMSE (on training data):")

training_rmse = []
validation_rmse = []

for period in range(0, periods):
# train the model, starting from the prior state.
dnn_regressor.train(
input_fn=training_input_fn,
steps=steps_per_period
)
# take abreak and compute perdictions.
training_predictions = dnn_regressor.predict(input_fn=predict_training_input_fn)
training_predictions = np.array([item['predictions'][0] for item in training_predictions])

validation_predictions = dnn_regressor.predict(input_fn=predict_validation_input_fn)
validation_predictions = np.array([item['predictions'][0] for item in validation_predictions])

# compute training an validation loss
training_root_mean_squared_error = math.sqrt(
metrics.mean_squared_error(training_predictions, training_targets))
validation_root_mean_squared_error = math.sqrt(
metrics.mean_squared_error(validation_predictions, validation_targets))

# Occasionally print the current loss.
print(" period %02d : %0.2f" % (period, training_root_mean_squared_error))
# Add the loss metrics from this period to our list.
training_rmse.append(training_root_mean_squared_error)
validation_rmse.append(validation_root_mean_squared_error)

print("Model training finished.")
# Output a graph of loss metrics over periods.
plt.ylabel("RMSE")
plt.xlabel("Periods")
plt.title("Root Mean Squared Error vs. Periods")
plt.tight_layout()
plt.plot(training_rmse, label="training")
plt.plot(validation_rmse, label="validation")
plt.legend()

print("Final RMSE (on training data): %0.2f" % training_root_mean_squared_error)
print("Final RMSE (on validation data): %0.2f" % validation_root_mean_squared_error)

return dnn_regressor

训练神经网络模型

调整超参数,目标是将 RMSE 降到 110 以下。

我们已经知道,在使用了很多特征的线性回归练习中,110 左右的 RMSE 已经是相当不错的结果。现在我们将得到比它更好的结果。

对于神经网络而言,过拟合是一种真正的潜在危险。您可以查看训练数据损失与验证数据损失之间的差值,以帮助判断模型是否有过拟合的趋势。如果差值开始变大,则通常可以肯定存在过拟合。

下面参数是我写的,也许有更好的参数会获得更低的RMSE。

1
2
3
4
5
6
7
8
9
dnn_regressor = train_nn_regression_model(
learning_rate=0.002,
steps=2000,
batch_size=100,
hidden_units=[8, 10],
training_examples=training_examples,
training_targets=training_targets,
validation_examples=validation_examples,
validation_targets=validation_targets)
Training model...
RMSE (on training data):
  period 00 : 153.67
  period 01 : 136.38
  period 02 : 119.26
  period 03 : 106.87
  period 04 : 107.80
  period 05 : 106.81
  period 06 : 107.47
  period 07 : 109.00
  period 08 : 104.58
  period 09 : 106.55
Model training finished.
Final RMSE (on training data):   106.55
Final RMSE (on validation data): 106.82

png

用测试数据进行评估

确认验证效果结果经受得住测试数据的检验。

获得满意的模型后,用测试数据评估该模型,以与验证效果进行比较。

测试数据集位于此处

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
california_housing_test_data = pd.read_csv("https://download.mlcc.google.cn/mledu-datasets/california_housing_test.csv", sep=",")

test_examples = preprocess_features(california_housing_test_data)
test_targets = preprocess_targets(california_housing_test_data)

predict_testing_input_fn = lambda: my_input_fn(test_examples,
test_targets["median_house_value"],
num_epochs=1,
shuffle=False)

test_predictions = dnn_regressor.predict(input_fn=predict_testing_input_fn)
test_predictions = np.array([item['predictions'][0] for item in test_predictions])

root_mean_squared_error = math.sqrt(
metrics.mean_squared_error(test_predictions, test_targets))

print("Final RMSE (on test data): %0.2f" % root_mean_squared_error)
Final RMSE (on test data): 105.23

提高神经网络性能

通过将特征标准化并应用各种优化算法来提高神经网络的性能

注意:本练习中介绍的优化方法并非专门针对神经网络;这些方法可有效改进大多数类型的模型。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
def train_nn_regression_model_optimize(
my_optimizer,
steps,
batch_size,
hidden_units,
training_examples,
training_targets,
validation_examples,
validation_targets):
"""Trains a neural network regression model.

In addition to training, this function also prints training progress information,
as well as a plot of the training and validation loss over time.

Args:
my_optimizer: An instance of `tf.train.Optimizer`, the optimizer to use.
steps: A non-zero `int`, the total number of training steps. A training step
consists of a forward and backward pass using a single batch.
batch_size: A non-zero `int`, the batch size.
hidden_units: A `list` of int values, specifying the number of neurons in each layer.
training_examples: A `DataFrame` containing one or more columns from
`california_housing_dataframe` to use as input features for training.
training_targets: A `DataFrame` containing exactly one column from
`california_housing_dataframe` to use as target for training.
validation_examples: A `DataFrame` containing one or more columns from
`california_housing_dataframe` to use as input features for validation.
validation_targets: A `DataFrame` containing exactly one column from
`california_housing_dataframe` to use as target for validation.

Returns:
A tuple `(estimator, training_losses, validation_losses)`:
estimator: the trained `DNNRegressor` object.
training_losses: a `list` containing the training loss values taken during training.
validation_losses: a `list` containing the validation loss values taken during training.
"""

periods = 10
steps_per_period = steps / periods

# Create a DNNRegressor object.
my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)
dnn_regressor = tf.estimator.DNNRegressor(
feature_columns=construct_feature_columns(training_examples),
hidden_units=hidden_units,
optimizer=my_optimizer
)

# Create input functions.
training_input_fn = lambda: my_input_fn(training_examples,
training_targets["median_house_value"],
batch_size=batch_size)
predict_training_input_fn = lambda: my_input_fn(training_examples,
training_targets["median_house_value"],
num_epochs=1,
shuffle=False)
predict_validation_input_fn = lambda: my_input_fn(validation_examples,
validation_targets["median_house_value"],
num_epochs=1,
shuffle=False)

# Train the model, but do so inside a loop so that we can periodically assess
# loss metrics.
print("Training model...")
print("RMSE (on training data):")
training_rmse = []
validation_rmse = []
for period in range (0, periods):
# Train the model, starting from the prior state.
dnn_regressor.train(
input_fn=training_input_fn,
steps=steps_per_period
)
# Take a break and compute predictions.
training_predictions = dnn_regressor.predict(input_fn=predict_training_input_fn)
training_predictions = np.array([item['predictions'][0] for item in training_predictions])

validation_predictions = dnn_regressor.predict(input_fn=predict_validation_input_fn)
validation_predictions = np.array([item['predictions'][0] for item in validation_predictions])

# Compute training and validation loss.
training_root_mean_squared_error = math.sqrt(
metrics.mean_squared_error(training_predictions, training_targets))
validation_root_mean_squared_error = math.sqrt(
metrics.mean_squared_error(validation_predictions, validation_targets))
# Occasionally print the current loss.
print(" period %02d : %0.2f" % (period, training_root_mean_squared_error))
# Add the loss metrics from this period to our list.
training_rmse.append(training_root_mean_squared_error)
validation_rmse.append(validation_root_mean_squared_error)
print("Model training finished.")

# Output a graph of loss metrics over periods.
plt.ylabel("RMSE")
plt.xlabel("Periods")
plt.title("Root Mean Squared Error vs. Periods")
plt.tight_layout()
plt.plot(training_rmse, label="training")
plt.plot(validation_rmse, label="validation")
plt.legend()

print("Final RMSE (on training data): %0.2f" % training_root_mean_squared_error)
print("Final RMSE (on validation data): %0.2f" % validation_root_mean_squared_error)

return dnn_regressor, training_rmse, validation_rmse
1
2
3
4
5
6
7
8
9
_ = train_nn_regression_model_optimize(
my_optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.0007),
steps=5000,
batch_size=70,
hidden_units=[10, 10],
training_examples=training_examples,
training_targets=training_targets,
validation_examples=validation_examples,
validation_targets=validation_targets)
Training model...
RMSE (on training data):
  period 00 : 162.48
  period 01 : 157.66
  period 02 : 150.85
  period 03 : 142.03
  period 04 : 131.60
  period 05 : 120.72
  period 06 : 113.46
  period 07 : 110.01
  period 08 : 108.28
  period 09 : 107.34
Model training finished.
Final RMSE (on training data):   107.34
Final RMSE (on validation data): 108.43

png

线性缩放

将输入标准化以使其位于 (-1, 1) 范围内可能是一种良好的标准做法。这样一来,SGD 在一个维度中采用很大步长(或者在另一维度中采用很小步长)时不会受阻。数值优化的爱好者可能会注意到,这种做法与使用预调节器 (Preconditioner) 的想法是有联系的。

1
2
3
4
5
def linear_scale(series):
min_val = series.min()
max_val = series.max()
scale = (max_val - min_val) / 2.0
return series.apply(lambda x:((x - min_val) / scale) - 1.0)

使用线性缩放将特征标准化

将输入标准化到 (-1, 1) 这一范围内。能达到什么程度的效果?

一般来说,当输入特征大致位于相同范围时,神经网络的训练效果最好。

对您的标准化数据进行健全性检查。(如果您忘了将某个特征标准化,会发生什么情况?)

由于标准化会使用最小值和最大值,我们必须确保在整个数据集中一次性完成该操作。

我们之所以可以这样做,是因为我们所有的数据都在一个 DataFrame 中。如果我们有多个数据集,则最好从训练集中导出标准化参数,然后以相同方式将其应用于测试集。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def normalize_linear_scale(examples_dataframe):
"""Returns a version of the input `DataFrame` that has all its features normalized linearly."""
processed_features = pd.DataFrame()
processed_features["latitude"] = linear_scale(examples_dataframe["latitude"])
processed_features["longitude"] = linear_scale(examples_dataframe["longitude"])
processed_features["housing_median_age"] = linear_scale(examples_dataframe["housing_median_age"])
processed_features["total_rooms"] = linear_scale(examples_dataframe["total_rooms"])
processed_features["total_bedrooms"] = linear_scale(examples_dataframe["total_bedrooms"])
processed_features["population"] = linear_scale(examples_dataframe["population"])
processed_features["households"] = linear_scale(examples_dataframe["households"])
processed_features["median_income"] = linear_scale(examples_dataframe["median_income"])
processed_features["rooms_per_person"] = linear_scale(examples_dataframe["rooms_per_person"])
return processed_features

normalized_dataframe = normalize_linear_scale(preprocess_features(california_housing_dataframe))
normalized_training_examples = normalized_dataframe.head(12000)
normalized_validation_examples = normalized_dataframe.tail(5000)

_ = train_nn_regression_model_optimize(
my_optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.005),
steps=2000,
batch_size=50,
hidden_units=[10, 10],
training_examples=normalized_training_examples,
training_targets=training_targets,
validation_examples=normalized_validation_examples,
validation_targets=validation_targets)
Training model...
RMSE (on training data):
  period 00 : 163.23
  period 01 : 115.65
  period 02 : 106.20
  period 03 : 91.27
  period 04 : 79.65
  period 05 : 76.10
  period 06 : 74.00
  period 07 : 72.52
  period 08 : 71.52
  period 09 : 70.72
Model training finished.
Final RMSE (on training data):   70.72
Final RMSE (on validation data): 72.52

png

尝试其他优化器

使用 AdaGrad 和 Adam 优化器并对比其效果。

AdaGrad 优化器是一种备选方案。AdaGrad 的核心是灵活地修改模型中每个系数的学习率,从而单调降低有效的学习率。该优化器对于凸优化问题非常有效,但不一定适合非凸优化问题的神经网络训练。您可以通过指定 AdagradOptimizer(而不是 GradientDescentOptimizer)来使用 AdaGrad。请注意,对于 AdaGrad,您可能需要使用较大的学习率。

对于非凸优化问题,Adam 有时比 AdaGrad 更有效。要使用 Adam,请调用 tf.train.AdamOptimizer 方法。此方法将几个可选超参数作为参数,但我们的解决方案仅指定其中一个 (learning_rate)。在应用设置中,您应该谨慎指定和调整可选超参数。

1
2
3
4
5
6
7
8
9
10
# 首先,我们来尝试 AdaGrad。
_, adagrad_training_losses, adagrad_validation_losses = train_nn_regression_model_optimize(
my_optimizer=tf.train.AdagradOptimizer(learning_rate=0.5),
steps=500,
batch_size=100,
hidden_units=[10, 10],
training_examples=normalized_training_examples,
training_targets=training_targets,
validation_examples=normalized_validation_examples,
validation_targets=validation_targets)
Training model...
RMSE (on training data):
  period 00 : 84.89
  period 01 : 72.08
  period 02 : 71.06
  period 03 : 71.18
  period 04 : 69.57
  period 05 : 72.57
  period 06 : 69.04
  period 07 : 67.49
  period 08 : 68.83
  period 09 : 69.24
Model training finished.
Final RMSE (on training data):   69.24
Final RMSE (on validation data): 71.91

png

1
2
3
4
5
6
7
8
9
10
 # 现在,我们来尝试 Adam。
_, adam_training_losses, adam_validation_losses = train_nn_regression_model_optimize(
my_optimizer=tf.train.AdamOptimizer(learning_rate=0.009),
steps=500,
batch_size=100,
hidden_units=[10, 10],
training_examples=normalized_training_examples,
training_targets=training_targets,
validation_examples=normalized_validation_examples,
validation_targets=validation_targets)
Training model...
RMSE (on training data):
  period 00 : 171.53
  period 01 : 108.88
  period 02 : 100.89
  period 03 : 88.85
  period 04 : 76.59
  period 05 : 72.77
  period 06 : 70.84
  period 07 : 70.29
  period 08 : 69.79
  period 09 : 68.84
Model training finished.
Final RMSE (on training data):   68.84
Final RMSE (on validation data): 70.98

png

1
2
3
4
5
6
7
8
9
# 我们并排输出损失指标的图表。
plt.ylabel("RMSE")
plt.xlabel("Periods")
plt.title("Root Mean Squared Error vs. Periods")
plt.plot(adagrad_training_losses, label='Adagrad training')
plt.plot(adagrad_validation_losses, label='Adagrad validation')
plt.plot(adam_training_losses, label='Adam training')
plt.plot(adam_validation_losses, label='Adam validation')
_ = plt.legend()

png

尝试其他标准化方法

尝试对各种特征使用其他标准化方法,以进一步提高性能。

如果仔细查看转换后数据的汇总统计信息,您可能会注意到,对某些特征进行线性缩放会使其聚集到接近 -1 的位置。

例如,很多特征的中位数约为 -0.8,而不是 0.0

1
_ = training_examples.hist(bins=20, figsize=(18, 12), xlabelsize=2)

png

通过选择其他方式来转换这些特征,我们可能会获得更好的效果。

例如,对数缩放可能对某些特征有帮助。或者,截取极端值可能会使剩余部分的信息更加丰富。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def log_normalize(series):
return series.apply(lambda x:math.log(x+1.0))

def clip(series, clip_to_min, clip_to_max):
return series.apply(lambda x:(
min(max(x, clip_to_min), clip_to_max)))

def z_score_normalize(series):
mean = series.mean()
std_dv = series.std()
return series.apply(lambda x:(x - mean) / std_dv)

def binary_threshold(series, threshold):
return series.apply(lambda x:(1 if x > threshold else 0))

上述部分包含一些额外的标准化函数。

请注意,如果您将目标标准化,则需要将网络的预测结果非标准化,以便比较损失函数的值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def normalize(examples_dataframe):
"""Returns a version of the input `DataFrame` that has all its features normalized."""
processed_features = pd.DataFrame()

processed_features["households"] = log_normalize(examples_dataframe["households"])
processed_features["median_income"] = log_normalize(examples_dataframe["median_income"])
processed_features["total_bedrooms"] = log_normalize(examples_dataframe["total_bedrooms"])

processed_features["latitude"] = linear_scale(examples_dataframe["latitude"])
processed_features["longitude"] = linear_scale(examples_dataframe["longitude"])
processed_features["housing_median_age"] = linear_scale(examples_dataframe["housing_median_age"])

processed_features["population"] = linear_scale(clip(examples_dataframe["population"], 0, 5000))
processed_features["rooms_per_person"] = linear_scale(clip(examples_dataframe["rooms_per_person"], 0, 5))
processed_features["total_rooms"] = linear_scale(clip(examples_dataframe["total_rooms"], 0, 10000))

return processed_features

normalized_dataframe = normalize(preprocess_features(california_housing_dataframe))
normalized_training_examples = normalized_dataframe.head(12000)
normalized_validation_examples = normalized_dataframe.tail(5000)

_ = train_nn_regression_model_optimize(
my_optimizer=tf.train.AdagradOptimizer(learning_rate=0.15),
steps=1000,
batch_size=50,
hidden_units=[10, 10],
training_examples=normalized_training_examples,
training_targets=training_targets,
validation_examples=normalized_validation_examples,
validation_targets=validation_targets)
Training model...
RMSE (on training data):
  period 00 : 89.38
  period 01 : 75.15
  period 02 : 72.35
  period 03 : 70.70
  period 04 : 70.95
  period 05 : 69.13
  period 06 : 68.46
  period 07 : 68.42
  period 08 : 68.68
  period 09 : 67.93
Model training finished.
Final RMSE (on training data):   67.93
Final RMSE (on validation data): 69.55

png

仅使用纬度和经度特征

训练仅使用纬度和经度作为特征的神经网络模型。

房地产商喜欢说,地段是房价的唯一重要特征。
我们来看看能否通过训练仅使用纬度和经度作为特征的模型来证实这一点。

只有我们的神经网络模型可以从纬度和经度中学会复杂的非线性规律,才能达到我们想要的效果。

注意:我们可能需要一个网络结构,其层数比我们之前在练习中使用的要多。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def location_location_location(examples_dataframe):
"""Returns a version of the input `DataFrame` that keeps only the latitude and longitude."""
processed_features = pd.DataFrame()
processed_features["latitude"] = linear_scale(examples_dataframe["latitude"])
processed_features["longitude"] = linear_scale(examples_dataframe["longitude"])
return processed_features

lll_dataframe = location_location_location(preprocess_features(california_housing_dataframe))
lll_training_examples = lll_dataframe.head(12000)
lll_validation_examples = lll_dataframe.tail(5000)

_ = train_nn_regression_model_optimize(
my_optimizer=tf.train.AdagradOptimizer(learning_rate=0.05),
steps=500,
batch_size=50,
hidden_units=[10, 10, 5, 5, 5],
training_examples=lll_training_examples,
training_targets=training_targets,
validation_examples=lll_validation_examples,
validation_targets=validation_targets)
Training model...
RMSE (on training data):
  period 00 : 157.77
  period 01 : 107.67
  period 02 : 105.43
  period 03 : 104.46
  period 04 : 103.15
  period 05 : 101.82
  period 06 : 101.01
  period 07 : 100.61
  period 08 : 100.10
  period 09 : 99.77
Model training finished.
Final RMSE (on training data):   99.77
Final RMSE (on validation data): 100.48

png

最好使纬度和经度保持标准化状态,对于只有两个特征的模型,结果并不算太糟。当然,地产价值在短距离内仍然可能有较大差异。

0%