向实时推理端点部署机器学习模型

概览

在本教程中，您将学习如何使用 Amazon SageMaker Studio 向实时推理端点部署经过训练的机器学习（ML）模型。

SageMaker Studio 是 ML 的一个集成开发环境（IDE），它提供了一个完全托管的 Jupyter notebook 界面，您可以在其中执行端到端 ML 生命周期任务，包括模型部署。

SageMaker 提供不同的推理选项，以支持一系列使用案例：

SageMaker 实时推理适用于具有毫秒级低延迟要求的工作负载
SageMaker 无服务器推理适用于具有间歇性或不频繁流量模式的工作负载
SageMaker 异步推理适用于具有较大有效负载或需要较长处理时间的推理
SageMaker 批量转换会对批量数据运行预测

在本教程中，您将使用实时推理选项来部署二进制分类 XGBoost 模型，该模型已经过合成汽车保险索赔数据集训练。该数据集由 claims（索赔）和 customer（客户）表的详细信息以及从中提取的特征组成，另外还包括指示某索赔是否为欺诈的 fraud（欺诈）列。模型会预测某索赔为欺诈的概率。您将扮演机器学习工程师的角色，部署该模型并运行示例推理。

您将学到的内容

在本指南中，您将：

从经过训练模型的生成物中建立 SageMaker 模型
配置并部署实时推理端点以用于该模型
调用端点，以使用测试数据来运行示例预测
向端点附加自动扩缩策略，以处理流量变更

先决条件

在开始学习本指南之前，您需要：

一个 AWS 账户：如果您还没有账户，请遵循设置 AWS 环境入门指南中的说明获取快速概览。

AWS 使用经验

新手

完成时间

25 分钟

所需费用

请参阅 SageMaker 定价估算此教程的费用。

需要

您必须登录 AWS 账户。

使用的服务

Amazon SageMaker 实时推理、Amazon SageMaker Studio

上次更新日期

2022 年 5 月 19 日

实施

第 1 步：设置 Amazon SageMaker Studio 域

使用 Amazon SageMaker，您可以使用控制台可视化地部署模型，也可以使用 SageMaker Studio 或 SageMaker 笔记本以编程方式部署模型。在本教程中，您将使用 SageMaker Studio 笔记本以编程的方式部署模型，该笔记本需要一个 SageMaker Studio 域。

一个 AWS 账户在一个区域只能有一个 SageMaker Studio 域。如果您在美国东部（弗吉尼亚州北部）区域已经有一个 SageMaker Studio 域，请遵照 SageMaker Studio 设置指南将所需的 AWS IAM 策略附加到您的 SageMaker Studio 账户，然后跳过第 1 步，并直接继续第 2 步操作。

如果您没有现有的 SageMaker Studio 域，则继续第 1 步以运行 AWS CloudFormation 模板，从而创建 SageMaker Studio 域并添加本教程剩余部分所需的权限。

选择 AWS CloudFormation 堆栈链接。此链接将打开 AWS CloudFormation 控制台并创建您的 SageMaker Studio 域和名为 studio-user 的用户。它还将添加所需权限到您的 SageMaker Studio 账户。在 CloudFormation 控制台中，确认美国东部（弗吉尼亚州北部）是右上角显示的区域。 堆栈名称应为 CFN-SM-IM-Lambda-catalog，且不应更改。此堆栈需要花费 10 分钟左右才能创建所有资源。

此堆栈假设您已经在账户中设置了一个公有 VPC。如果您没有公有 VPC，请参阅具有单个公有子网的 VPC以了解如何创建公有 VPC。

创建 CloudFormation 堆栈以设置 Amazon SageMaker Studio

选择 I acknowledge that AWS CloudFormation might create IAM resources（我确认，AWS CloudFormation 可能创建 IAM 资源），然后选择 Create stack（创建堆栈）。

在 CloudFormation 窗格上，选择 Stacks（堆栈）。堆栈约需要 10 分钟才能完成创建。创建堆栈时，堆栈状态从 CREATE_IN_PROGRESS 变为 CREATE_COMPLETE。

第 2 步：设置 SageMaker Studio 笔记本

在此步骤中，您要启动一个新的 SageMaker Studio 笔记本，安装必要的开源库，并配置从 Amazon Simple Storage Service（Amazon S3）获取经过训练模型的生成物所需的 SageMaker 变量。但由于无法直接部署模型生成物以用于推理，您需要首先从模型的生成物建立 SageMaker 模型。所建立的模型将包含 SageMaker 将在模型部署时使用的训练和推理代码。

在管理控制台搜索栏中输入 SageMaker Studio，然后选择 SageMaker Studio。

从 SageMaker 控制台右上角的 Region（区域）下拉列表中选择 US East (N. Virginia)（美国东部（弗吉尼亚州北部））。对于 Launch app（启动应用程序），选择 Studio 以使用 studio-user 配置文件打开 SageMaker Studio。

打开 SageMaker Studio 界面。在导航栏上，选择 File（文件）、New（新建）、Notebook（笔记本）。

在 Set up notebook environment（设置笔记本环境）对话框中的 Image（图像）下，选择 Data Science（数据科学）。将自动选择 Python 3 内核。选择 Select（选择）。

笔记本右上角上的内核现在应显示 Python 3 (Data Science)（Python 3（数据科学））。

复制并粘贴以下代码片段到笔记本的单元格，按 Shift+Enter 运行当前单元格以便更新 aiobotocore 库，它是与很多 AWS 服务进行交互的 API。忽略任何警告以重新启动内核或任何依赖项冲突错误。

%pip install --upgrade -q aiobotocore

您还需要实例化 S3 客户端对象，以及默认 S3 桶中上传指标和模型生成物的位置。为此，复制并粘贴以下代码到笔记本的单元格中并进行运行。注意，以下代码第 16 行中的 SageMaker 会话将自动创建 sagemaker-<your-Region>-<your-account-id> 写入桶。用于训练的数据集存在于一个名为 sagemaker-sample-files 的公有 S3 桶桶中，该桶在第 29 行中被指定为读取桶。桶中的位置通过读取前缀指定。

import pandas as pd
import numpy as np
import boto3
import sagemaker
import time
import json
import io
from io import StringIO
import base64
import pprint
import re

from sagemaker.image_uris import retrieve

sess = sagemaker.Session()
write_bucket = sess.default_bucket()
write_prefix = "fraud-detect-demo"

region = sess.boto_region_name
s3_client = boto3.client("s3", region_name=region)
sm_client = boto3.client("sagemaker", region_name=region)
sm_runtime_client = boto3.client("sagemaker-runtime")
sm_autoscaling_client = boto3.client("application-autoscaling")

sagemaker_role = sagemaker.get_execution_role()


# S3 locations used for parameterizing the notebook run
read_bucket = "sagemaker-sample-files"
read_prefix = "datasets/tabular/synthetic_automobile_claims" 
model_prefix = "models/xgb-fraud"

data_capture_key = f"{write_prefix}/data-capture"

# S3 location of trained model artifact
model_uri = f"s3://{read_bucket}/{model_prefix}/fraud-det-xgb-model.tar.gz"

# S3 path where data captured at endpoint will be stored
data_capture_uri = f"s3://{write_bucket}/{data_capture_key}"

# S3 location of test data
test_data_uri = f"s3://{read_bucket}/{read_prefix}/test.csv"

第 3 步：创建实时推理端点

SageMaker 中有多种方法可以部署经过训练的模型到实时推理端点：SageMaker 开发工具包、AWS 开发工具包 – Boto3 和 SageMaker 控制台。有关更多信息，请参阅《Amazon SageMaker 开发人员指南》中的部署用于推理的模型。SageMaker 开发工具包与 AWS 开发工具包 – Boto3 相比有更多抽象，后者公开较低级别 API 以更有力地控制模型部署。在本教程中，您要使用 AWS 开发工具包 – Boto3 部署模型。要部署模型，您需要按顺序遵循三个步骤：

从模型的生成物中建立 SageMaker 模型
创建端点配置以指定属性，包括实例类型和数量
使用端点配置来创建端点

要使用存储在 S3 中经过训练模型的生成物建立 SageMaker 模型，请复制并粘贴以下代码。create_model 方法使用包含训练图像的 Docker 容器（此模型为 XGBoost 容器）和模型生成物的 S3 位置作为参数。

# Retrieve the SageMaker managed XGBoost image
training_image = retrieve(framework="xgboost", region=region, version="1.3-1")

# Specify a unique model name that does not exist
model_name = "fraud-detect-xgb"
primary_container = {
                     "Image": training_image,
                     "ModelDataUrl": model_uri
                    }

model_matches = sm_client.list_models(NameContains=model_name)["Models"]
if not model_matches:
    model = sm_client.create_model(ModelName=model_name,
                                   PrimaryContainer=primary_container,
                                   ExecutionRoleArn=sagemaker_role)
else:
    print(f"Model with name {model_name} already exists! Change model name to create new")

您可以在 SageMaker 控制台的 Models（模型）部分下方检查创建的模型。

在创建 SageMaker 模型以后，复制并粘贴以下代码，以使用 Boto3 create_endpoint_config 方法来配置端点。create_endpoint_config 方法的主要输入为端点配置名称和变体信息，例如，推理实例类型和数量、要部署的模型的名称，以及端点应处理的流量份额。除这些设置以外，您还可以通过指定 DataCaptureConfig 设置数据捕获。此功能允许您配置实时端点，在 Amazon S3 中捕获与存储请求和/或响应。数据捕获是设置模型监控的步骤之一；当与基准指标和监控作业结合时，它会通过比较测试数据指标和基准来帮助您监控模型性能。此类监控非常适用于根据模型或数据漂移和审核目的来安排模型的再训练。在当前设置中，输入（传入的测试数据）和输出（模型预测）会被捕获，并存储在您的默认 S3 桶当中。

# Endpoint Config name
endpoint_config_name = f"{model_name}-endpoint-config"

# Endpoint config parameters
production_variant_dict = {
                           "VariantName": "Alltraffic",
                           "ModelName": model_name,
                           "InitialInstanceCount": 1,
                           "InstanceType": "ml.m5.xlarge",
                           "InitialVariantWeight": 1
                          }

# Data capture config parameters
data_capture_config_dict = {
                            "EnableCapture": True,
                            "InitialSamplingPercentage": 100,
                            "DestinationS3Uri": data_capture_uri,
                            "CaptureOptions": [{"CaptureMode" : "Input"}, {"CaptureMode" : "Output"}]
                           }


# Create endpoint config if one with the same name does not exist
endpoint_config_matches = sm_client.list_endpoint_configs(NameContains=endpoint_config_name)["EndpointConfigs"]
if not endpoint_config_matches:
    endpoint_config_response = sm_client.create_endpoint_config(
                                                                EndpointConfigName=endpoint_config_name,
                                                                ProductionVariants=[production_variant_dict],
                                                                DataCaptureConfig=data_capture_config_dict
                                                               )
else:
    print(f"Endpoint config with name {endpoint_config_name} already exists! Change endpoint config name to create new")

您可以在 SageMaker 控制台的 Endpoint configurations（端点配置）部分下方检查创建的端点配置。

复制并粘贴以下代码，以创建端点。create_endpoint 方法使用端点配置作为参数，并将端点配置中指定的模型部署到计算实例。部署模型需要大约 6 分钟。

endpoint_name = f"{model_name}-endpoint"

endpoint_matches = sm_client.list_endpoints(NameContains=endpoint_name)["Endpoints"]
if not endpoint_matches:
    endpoint_response = sm_client.create_endpoint(
                                                  EndpointName=endpoint_name,
                                                  EndpointConfigName=endpoint_config_name
                                                 )
else:
    print(f"Endpoint with name {endpoint_name} already exists! Change endpoint name to create new")

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
while status == "Creating":
    print(f"Endpoint Status: {status}...")
    time.sleep(60)
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
print(f"Endpoint Status: {status}")

要检查端点的状态，请选择 SageMaker resources（SageMaker 资源）图标。在 SageMaker resources（SageMaker 资源）中，选择 Endpoints（端点），然后选择 fraud-detect-xgb-endpoint 作为名称。

第 4 步：调用推理端点

在端点状态变更为 InService 以后，您可以使用 REST API、AWS 开发工具包 – Boto3、SageMaker Studio、AWS CLI 或 SageMaker Python 开发工具包来调用该端点。在本教程中，您要使用 AWS 开发工具包 – Boto3。在调用端点前，务必要利用序列化和反序列化将测试数据格式化为适用于端点的格式。序列化是指，将 .csv 等格式的原始数据转换成端点可使用的字节流的过程。反序列化是将字节流转换为人类可读格式的反向流程。在本教程中，您可以通过发送测试数据集中的前五个示例来调用端点。要调用端点并取得预测结果，请复制并粘贴以下代码。由于向端点（测试数据集）发送的请求（测试数据集）使用 .csv 格式，因此要使用 csv 序列化流程来创建有效负载。然后，响应会被反序列化为一组预测。在执行完毕以后，单元格会为测试示例返回模型预测和 true 标签。请注意，XGBoost 模型返回概率，而不是实际分类标签。模型预测测试示例为欺诈索赔的可能性非常低，而且预测与 true 标签相符。

# Fetch test data to run predictions with the endpoint
test_df = pd.read_csv(test_data_uri)

# For content type text/csv, payload should be a string with commas separating the values for each feature
# This is the inference request serialization step
# CSV serialization
csv_file = io.StringIO()
test_sample = test_df.drop(["fraud"], axis=1).iloc[:5]
test_sample.to_csv(csv_file, sep=",", header=False, index=False)
payload = csv_file.getvalue()
response = sm_runtime_client.invoke_endpoint(
                                             EndpointName=endpoint_name,
                                             Body=payload,
                                             ContentType="text/csv",
                                             Accept="text/csv"
                                            )

# This is the inference response deserialization step
# This is a bytes object
result = response["Body"].read()
# Decoding bytes to a string
result = result.decode("utf-8")
# Converting to list of predictions
result = re.split(",|n",result)

prediction_df = pd.DataFrame()
prediction_df["Prediction"] = result[:5]
prediction_df["Label"] = test_df["fraud"].iloc[:5].values
prediction_df

要使用 Amazon CloudWatch 监控端点调用指标，请打开 SageMaker 控制台。在 Inference（推理）下方，依次选择 Endpoints（端点）和 fraud-detect-xgb-endpoint。

在 Endpoint details（端点详细信息）页面的 Monitor（监控）下方，选择 View invocation metrics（查看调用指标）。刚开始时，您可能只在指标图表中看到单个点。但在多次调用以后，您将看到一条和示例屏幕截图中类似的线。

Metrics（指标）页面显示多项端点性能指标。您可以选择不同的时间段，例如，1 个小时内或 3 个小时内，以便对端点性能进行可视化。选择任何指标，以查看它在所选时间段内的趋势。在下一步中，您要选择其中一项指标来定义自动扩缩策略。

因为已在端点配置中设置数据捕获，您可以检查什么有效负载被发送到端点（除了它的响应以外）。捕获的数据将需要一些时间才能被完全上传到 S3。复制并粘贴以下代码，以检查数据捕获是否已完成。

from sagemaker.s3 import S3Downloader
print("Waiting for captures to show up", end="")
for _ in range(90):
    capture_files = sorted(S3Downloader.list(f"{data_capture_uri}/{endpoint_name}"))
    if capture_files:
        capture_file = S3Downloader.read_file(capture_files[-1]).split("n")
        capture_record = json.loads(capture_file[0])
        if "inferenceId" in capture_record["eventMetadata"]:
            break
    print(".", end="", flush=True)
    time.sleep(1)
print()
print(f"Found {len(capture_files)} Data Capture Files:")

捕获的数据以 JSON Lines 格式（一种以换行符分隔的格式）存储在 S3 中作为每次端点调用的单独文件，以便存储每行为 JSON 值的结构化数据。复制并粘贴以下代码，以检索数据捕获文件。

capture_files = sorted(S3Downloader.list(f"{data_capture_uri}/{endpoint_name}"))
capture_file = S3Downloader.read_file(capture_files[0]).split("n")
capture_record = json.loads(capture_file[0])
capture_record

复制并粘贴以下代码，以便使用 base64 解码捕获文件中的数据。代码会检索被作为有效负载发送的五个测试示例，以及它们的预测。此功能十分适用于检查具有模型响应的端点负载，并监控模型的性能。

input_data = capture_record["captureData"]["endpointInput"]["data"]
output_data = capture_record["captureData"]["endpointOutput"]["data"]
input_data_list = base64.b64decode(input_data).decode("utf-8").split("n")
print(input_data_list)
output_data_list = base64.b64decode(output_data).decode("utf-8").split("n")
print(output_data_list)

第 5 步：配置端点的自动扩缩

使用实时推理端点的工作负载通常有低延迟要求。此外，当流量激增时，实时推理端点可能发生 CPU 过载、高延迟或超时。因此，对容量进行扩展，从而以低延迟高效地处理流量变动显得非常重要。SageMaker 推理自动扩缩会监控您的工作负载并动态调整实例数量，从而以低成本维持稳定而可预测的端点性能。当工作负载增加时，自动扩缩会使更多实例上线，而当工作负载减少时，它会移除不必要的实例，帮助降低您的计算成本。在本教程中，您要使用 AWS 开发工具包 – Boto3 来为您的端点设置自动扩缩。SageMaker 提供多种自动扩缩类型：目标跟踪扩展、步进扩展、按需扩展和计划扩展。在本教程中，您要使用目标跟踪扩展策略；当所选扩展指标增加并超过所选目标阈值时，该策略会被触发。

自动扩缩设置可分为两步。首先，您要使用每个端点的最小、所需和最大实例数量详细信息来配置扩展策略。复制并粘贴以下代码，以配置目标跟踪扩展策略。当流量超过所选阈值（您将在下一步选择）时，将启动指定最大数量的实例。

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)

# SageMaker expects resource id to be provided with the following structure
resource_id = f"endpoint/{endpoint_name}/variant/{resp['ProductionVariants'][0]['VariantName']}"

# Scaling configuration
scaling_config_response = sm_autoscaling_client.register_scalable_target(
                                                          ServiceNamespace="sagemaker",
                                                          ResourceId=resource_id,
                                                          ScalableDimension="sagemaker:variant:DesiredInstanceCount", 
                                                          MinCapacity=1,
                                                          MaxCapacity=2
                                                        )

复制并粘贴以下代码，以创建扩展策略。所选的扩展指标为 SageMakerVariantInvocationsPerInstance，它是某模型变体的每个推理实例每分钟被调用的平均次数。当此次数超过所选阈值 5 时，自动扩缩会被触发。

# Create Scaling Policy
policy_name = f"scaling-policy-{endpoint_name}"
scaling_policy_response = sm_autoscaling_client.put_scaling_policy(
                                                PolicyName=policy_name,
                                                ServiceNamespace="sagemaker",
                                                ResourceId=resource_id,
                                                ScalableDimension="sagemaker:variant:DesiredInstanceCount",
                                                PolicyType="TargetTrackingScaling",
                                                TargetTrackingScalingPolicyConfiguration={
                                                    "TargetValue": 5.0, # Target for avg invocations per minutes
                                                    "PredefinedMetricSpecification": {
                                                        "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance",
                                                    },
                                                    "ScaleInCooldown": 600, # Duration in seconds until scale in
                                                    "ScaleOutCooldown": 60 # Duration in seconds between scale out
                                                }
                                            )

复制并粘贴以下代码，以检索扩展策略的详细信息。

response = sm_autoscaling_client.describe_scaling_policies(ServiceNamespace="sagemaker")

pp = pprint.PrettyPrinter(indent=4, depth=4)
for i in response["ScalingPolicies"]:
    pp.pprint(i["PolicyName"])
    print("")
    if("TargetTrackingScalingPolicyConfiguration" in i):
        pp.pprint(i["TargetTrackingScalingPolicyConfiguration"])

复制并粘贴以下代码，对端点进行压力测试。代码运行 250 秒，并通过发送从测试数据集中随机选取的示例来重复调用端点。

request_duration = 250
end_time = time.time() + request_duration
print(f"Endpoint will be tested for {request_duration} seconds")
while time.time() < end_time:
    csv_file = io.StringIO()
    test_sample = test_df.drop(["fraud"], axis=1).iloc[[np.random.randint(0, test_df.shape[0])]]
    test_sample.to_csv(csv_file, sep=",", header=False, index=False)
    payload = csv_file.getvalue()
    response = sm_runtime_client.invoke_endpoint(
                                                 EndpointName=endpoint_name,
                                                 Body=payload,
                                                 ContentType="text/csv"
                                                )

您可以使用 Amazon CloudWatch 来监控端点指标。有关可用端点指标列表，包括调用等，请参阅 SageMaker 端点调用指标。在 SageMaker 控制台的 Inference（推理）下方，依次选择 Endpoints（端点）和 fraud-detect-xgb-endpoint。在 Endpoint details（端点详细信息）页面上，导航至 Monitor（监控）部分，然后选择 View invocation metrics（查看调用指标）。在 Metrics（指标）页面上，选择指标列表中的 InvocationsPerInstance（这是您在设置扩展策略时选择的监控指标）和 Invocations，然后选择 Graphed metrics（已绘制指标）选项卡。

在 Graphed metrics（已绘制指标）页面上，您可以用肉眼检查端点所收到的流量模式并更改时间粒度，例如，从默认的 5 分钟更改到 1 分钟。自动扩缩可能需要几分钟才能添加第二个实例。在添加新实例后，您将注意到每个实例的调用是总调用次数的一半。

当端点收到增加的有效负载，您可以通过运行以下代码检查端点的状态。此代码会检查端点的状态何时从 InService 更改到 Updating，并记录实例数量。在几分钟后，您将看到状态从 InService 更改为 Updating，然后再变回 InService，但实例数量增加。

# Check the instance counts after the endpoint gets more load
response = sm_client.describe_endpoint(EndpointName=endpoint_name)
endpoint_status = response["EndpointStatus"]
request_duration = 250
end_time = time.time() + request_duration
print(f"Waiting for Instance count increase for a max of {request_duration} seconds. Please re run this cell in case the count does not change")
while time.time() < end_time:
    response = sm_client.describe_endpoint(EndpointName=endpoint_name)
    endpoint_status = response["EndpointStatus"]
    instance_count = response["ProductionVariants"][0]["CurrentInstanceCount"]
    print(f"Status: {endpoint_status}")
    print(f"Current Instance count: {instance_count}")
    if (endpoint_status=="InService") and (instance_count>1):
        break
    else:
        time.sleep(15)

第 6 步：清理资源

最佳实践是删除您不再使用的资源，以免产生意外费用。

通过运行您的笔记本中的以下代码块，删除您在本教程中创建的模型、端点配置和端点。如果您不删除端点，您的账户将因为在端点运行的计算实例而持续累积费用。

# Delete model
sm_client.delete_model(ModelName=model_name)

# Delete endpoint configuration
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

# Delete endpoint
sm_client.delete_endpoint(EndpointName=endpoint_name)

要删除 S3 桶，请执行以下操作：

打开 Amazon S3 控制台。在导航栏上，选择 Buckets（桶）、sagemaker-<your-Region>-<your-account-id>，然后选择 fraud-detect-demo 旁的复选框。然后选择 Delete（删除）。
在 Delete objects（删除对象）对话框中，确认您是否已选中要删除的正确对象，并在 Permanently delete objects（永久删除对象）确认框中输入 permanently delete。
当此操作完成且桶为空时，您可以通过再次遵循相同程序来删除 sagemaker-<your-Region>-<your-account-id> 桶。

本教程中用于运行笔记本图像的数据科学内核将不断累积费用，直到您停止内核或执行以下步骤删除应用程序。有关更多信息，请参阅《Amazon SageMaker 开发人员指南》中的关闭资源。

要删除 SageMaker Studio 应用程序，请执行以下操作：在 SageMaker Studio 控制台中，选择 studio-user，然后通过选择 Delete app（删除应用程序）来删除 Apps（应用程序）下列出的所有应用程序。等待片刻直到状态更改为 Deleted（已删除）。

如果您在第 1 步中使用了一个现有的 SageMaker Studio 域，则跳过第 6 步的其余部分，直接进入结论部分。

如果您在第 1 步运行 CloudFormation 模板来创建新的 SageMaker Studio 域，请继续执行以下步骤以删除由 CloudFormation 模板创建的域、用户和资源。

要打开 CloudFormation 控制台，请在 AWS 管理控制台搜索栏中输入 CloudFormation，然后从搜索结果中选择 CloudFormation。

在 CloudFormation 窗格上，选择 Stacks（堆栈）。从 Status（状态）下拉列表中，选择 Active（活动）。在 Stack name（堆栈名称）下方，选择 CFN-SM-IM-Lambda-catalog 以打开堆栈详细信息页面。

在 CFN-SM-IM-Lambda-catalog 堆栈详细信息页面上，选择 Delete（删除）以删除堆栈以及在第 1 步中创建的资源。

结论

恭喜！您已完成向实时推理端点部署机器学习模型教程。

在本教程中，您创建了 SageMaker 模型，并将它部署到实时推理端点。您使用 AWS 开发工具包 – Boto3 API 调用端点，通过运行示例推理进行测试，同时利用数据捕获功能将端点有效负载和响应保存到 S3。最后，您使用目标端点调用指标来配置自动扩缩，以便处理流量的波动。

您可以按照下面的步骤使用 SageMaker 继续机器学习之旅。

训练深度学习模型

了解如何构建、训练和调优 TensorFlow 深度学习模型。

下一步 »

自动创建 ML 模型

了解如何使用 AutoML 在不编写代码的情况下开发 ML 模型。