转载

常用的重试技术—如何优雅的重试

背景

分布式环境下，重试是高可用技术中的一个部分，大家在调用RPC接口或者发送MQ时，针对可能会出现网络抖动请求超时情况采取一下重试操作，自己简单的编写重试大多不够优雅，而重试目前已有很多技术实现和框架支持，但也是有个有缺点，本文主要对其中进行整理，以求找到比较优雅的实现方案；

重试在功能设计上需要根据应用场景进行设计，读数据的接口比较适合重试的场景，写数据的接口就需要注意接口的幂等性了，还有就是重试次数如果太多的话会导致请求量加倍，给后端造成更大的压力，设置合理的重试机制是关键；

重试技术实现

本文整理比较常见的重试技术实现：

1、Spring Retry重试框架；

2、Guava Retry重试框架；

3、Spring Cloud 重试配置；

具体使用面进行整理：

1、 Spring Retry重试框架

SpringRetry使用有两种方式：

注解方式

最简单的一种方式

@Retryable(value = RuntimeException.class,maxAttempts = 3, backoff = @Backoff(delay = 5000L, multiplier = 2))

设置重试捕获条件，重试策略，熔断机制即可实现重试到熔断整个机制，这种标准方式查阅网文即可;

这里介绍一个自己处理熔断的情况，及不用 @Recover 来做兜底处理，继续往外抛出异常，代码大致如下：

Service中对方法进行重试:

@Override@Transactional
    @Retryable(value = ZcSupplyAccessException.class,maxAttempts = 3,backoff = @Backoff(delay = 2000,multiplier = 1.5))
    public OutputParamsDto doZcSupplyAccess(InputParamsDto inputDto) throws ZcSupplyAccessException {
        //1. 校验
       ....
        //2. 数据转换
      ....
        //3、存储
        try {
            doSaveDB(ioBusIcsRtnDatList);
            log.info("3.XXX-数据接入存储完成");
        } catch (Exception e) {
            log.info("3.XXX-数据接入存储失败{}", e);
            throw new ZcSupplyAccessException("XXX数据接入存储失败");
        }
        return new OutputParamsDto(true, "XXX处理成功");
    }

Controller中捕获异常进行处理，注意这里不用异常我们需要进行不同的处理，不能在 @Recover 中进行处理，以免无法在外层拿到不同的异常；

@PostMapping("/accessInfo")
    public OutputParamsDto accessInfo( @RequestBody InputParamsDto inputDto ){
         
        log.info("接入报文为："+JSONUtil.serialize(inputDto));
        OutputParamsDto output = validIdentity(inputDto);
        if(output==null || output.getSuccess()==false){
            return output;
        }
        log.info("Pre.1.安全认证通过");
        IAccessService accessService = null;
        try {
            ....
            accessService = (IAccessService) ApplicationContextBeansHolder.getBean(param.getParmVal());
            //先转发(异常需处理)
            output = accessService.doZcSupplyTranfer(inputDto);
            //后存储(异常不处理)
            accessService.doZcSupplyAccess(inputDto);
        } catch (ZcSupplyTransferException e){
            log.error("转发下游MQ重试3次均失败,请确认是否MQ服务不可用");
            return new OutputParamsDto(false,"转发下游MQ重试3次均失败,请确认是否MQ服务不可用");
        } catch (ZcSupplyAccessException e){
            log.error("接入存储重试3次均失败,请确认是否数据库不可用");
        } catch (Exception e) {
            log.error("通过bean名调用方法和处理发生异常："+e);
            return new OutputParamsDto(false,"通过bean名调用方法和处理发生异常");
        }
        ...
        
        return output;
        
    }

注意：

1、 @Recover中不能再抛出Exception，否则会报无法识别该异常的错误；

2、以注解的方式对方法进行重试，重试逻辑是同步执行的，重试的“失败”针对的是Throwable，如果你要以返回值的某个状态来判定是否需要重试，可能只能通过自己判断返回值然后显式抛出异常了。

方法式

注解式只是让我们使用更加便捷，但是有一定限制，比如要求抛异常才能重试，不能基于实体，Recover方法如果定义多个比较难指定具体哪个，尤其是在结构化的程序设计中，父子类中的覆盖等需要比较小心，SpringRetry提供编码方式可以提高灵活性，返回你自定义的实体进行后续处理，也更加友好。

下面代码中RecoveryCallback部分进行了异常的抛出，这里也可以返回实体对象，这样就比注解式更友好了。

import com.alibaba.fastjson.JSONObject;
import com.alibaba.fastjson.serializer.SerializerFeature;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.cloud.context.config.annotation.RefreshScope;
import org.springframework.retry.RecoveryCallback;
import org.springframework.retry.RetryCallback;
import org.springframework.retry.RetryContext;
import org.springframework.retry.backoff.ExponentialBackOffPolicy;
import org.springframework.retry.backoff.FixedBackOffPolicy;
import org.springframework.retry.policy.CircuitBreakerRetryPolicy;
import org.springframework.retry.policy.SimpleRetryPolicy;
import org.springframework.retry.support.RetryTemplate;
import org.springframework.stereotype.Component;

import java.time.LocalTime;
import java.util.Collections;
import java.util.Map;

/**
 * <p>
 * 系统 <br>
 * <br>
 * Created by    on 2019/9/1016:12  <br>
 * Revised by [修改人] on [修改日期] for [修改说明]<br>
 * </p>
 */
@Slf4j
@Component
@RefreshScope
public class ZcSupplySynRemoteRetryHandler {

    @Autowired
    RestTemplateFactory restTemplateFactory;

    final RetryTemplate retryTemplate = new RetryTemplate();

    //简单重试策略
    final SimpleRetryPolicy retryPolicy = new SimpleRetryPolicy(3, Collections.<Class<? extends Throwable>, Boolean>
            singletonMap(ZcSupplySynRemoteException.class, true));

    @Value("${retry.initialInterval}")
    private String initialInterval;

    @Value("${retry.multiplier}")
    private String multiplier;

    /**
     * 重试处理
     *
     * @param reqMap
     * @return
     * @throws ZcSupplySynRemoteException
     */
    public  Map<String, Object> doSyncWithRetry(Map<String, Object> reqMap, String url) throws ZcSupplySynRemoteException {
        //熔断重试策略
        CircuitBreakerRetryPolicy cbRetryPolicy = new CircuitBreakerRetryPolicy(new SimpleRetryPolicy(3));
        cbRetryPolicy.setOpenTimeout(3000);
        cbRetryPolicy.setResetTimeout(10000);

        //固定值退避策略
        FixedBackOffPolicy fixedBackOffPolicy = new FixedBackOffPolicy();
        fixedBackOffPolicy.setBackOffPeriod(100);

        //指数退避策略
        ExponentialBackOffPolicy exponentialBackOffPolicy = new ExponentialBackOffPolicy();
        exponentialBackOffPolicy.setInitialInterval(Long.parseLong(initialInterval));
        exponentialBackOffPolicy.setMultiplier(Double.parseDouble(multiplier));

        //设置策略
        retryTemplate.setRetryPolicy(retryPolicy);
        retryTemplate.setBackOffPolicy(exponentialBackOffPolicy);

        //重试回调
        RetryCallback<Map<String, Object>, ZcSupplySynRemoteException> retryCallback = new RetryCallback<Map<String, Object>, ZcSupplySynRemoteException>() {
            /**
             * Execute an operation with retry semantics. Operations should generally be
             * idempotent, but implementations may choose to implement compensation
             * semantics when an operation is retried.
             *
             * @param context the current retry context.
             * @return the result of the successful operation.
             * @throws ZcSupplySynRemoteException of type E if processing fails
             */
            @Override
            public Map<String, Object> doWithRetry(RetryContext context) throws ZcSupplySynRemoteException {
                try {
                    log.info(String.valueOf(LocalTime.now()));
                    Map<String, Object> rtnMap = (Map<String, Object>) restTemplateFactory.callRestService(url,
                            JSONObject.toJSONString(reqMap, SerializerFeature.WriteMapNullValue));
                    context.setAttribute("rtnMap",rtnMap);
                    return rtnMap;
                }catch (Exception e){
                    throw new ZcSupplySynRemoteException("调用资采同步接口发生错误,准备重试");
                }
            }
        };

        //兜底回调
        RecoveryCallback<Map<String, Object>> recoveryCallback = new RecoveryCallback<Map<String, Object>>() {
            /**
             * @param context the current retry context
             * @return an Object that can be used to replace the callback result that failed
             * @throws ZcSupplySynRemoteException when something goes wrong
             */
            public Map<String, Object> recover(RetryContext context) throws ZcSupplySynRemoteException{
                Map<String, Object> rtnMap = (Map<String, Object>)context.getAttribute("rtnMap");
                log.info("xxx重试3次均错误，请确认是否对方服务可用,调用结果{}", JSONObject.toJSONString(rtnMap, SerializerFeature.WriteMapNullValue));

                //注意:这里可以抛出异常，注解方式不可以，需要外层处理的需要使用这种方式
                throw new ZcSupplySynRemoteException("xxx重试3次均错误，请确认是否对方服务可用。");
            }
        };

        return retryTemplate.execute(retryCallback, recoveryCallback);
    }
}

核心类

RetryCallback: 封装你需要重试的业务逻辑；

RecoverCallback：封装在多次重试都失败后你需要执行的业务逻辑；

RetryContext: 重试语境下的上下文，可用于在多次Retry或者Retry 和Recover之间传递参数或状态；

RetryOperations : 定义了“重试”的基本框架（模板），要求传入RetryCallback，可选传入RecoveryCallback；

RetryListener：典型的“监听者”，在重试的不同阶段通知“监听者”；

RetryPolicy : 重试的策略或条件，可以简单的进行多次重试，可以是指定超时时间进行重试；

BackOffPolicy: 重试的回退策略，在业务逻辑执行发生异常时。如果需要重试，我们可能需要等一段时间(可能服务器过于繁忙，如果一直不间隔重试可能拖垮服务器)，当然这段时间可以是 0，也可以是固定的，可以是随机的（参见tcp的拥塞控制算法中的回退策略）。回退策略在上文中体现为wait()；

RetryTemplate: RetryOperations的具体实现，组合了RetryListener[]，BackOffPolicy，RetryPolicy。

重试策略

NeverRetryPolicy：只允许调用RetryCallback一次，不允许重试

AlwaysRetryPolicy：允许无限重试，直到成功，此方式逻辑不当会导致死循环

SimpleRetryPolicy：固定次数重试策略，默认重试最大次数为3次，RetryTemplate默认使用的策略

TimeoutRetryPolicy：超时时间重试策略，默认超时时间为1秒，在指定的超时时间内允许重试

ExceptionClassifierRetryPolicy：设置不同异常的重试策略，类似组合重试策略，区别在于这里只区分不同异常的重试

CircuitBreakerRetryPolicy：有熔断功能的重试策略，需设置3个参数openTimeout、resetTimeout和delegate

CompositeRetryPolicy：组合重试策略，有两种组合方式，乐观组合重试策略是指只要有一个策略允许重试即可以，

悲观组合重试策略是指只要有一个策略不允许重试即可以，但不管哪种组合方式，组合中的每一个策略都会执行

重试回退策略

重试回退策略，指的是每次重试是立即重试还是等待一段时间后重试。

默认情况下是立即重试，如果需要配置等待一段时间后重试则需要指定回退策略BackoffRetryPolicy。

NoBackOffPolicy：无退避算法策略，每次重试时立即重试

FixedBackOffPolicy：固定时间的退避策略，需设置参数sleeper和backOffPeriod，sleeper指定等待策略，默认是Thread.sleep，即线程休眠，backOffPeriod指定休眠时间，默认1秒

UniformRandomBackOffPolicy：随机时间退避策略，需设置sleeper、minBackOffPeriod和maxBackOffPeriod，该策略在[minBackOffPeriod,maxBackOffPeriod之间取一个随机休眠时间，minBackOffPeriod默认500毫秒，maxBackOffPeriod默认1500毫秒

ExponentialBackOffPolicy：指数退避策略，需设置参数sleeper、initialInterval、maxInterval和multiplier，initialInterval指定初始休眠时间，默认100毫秒，maxInterval指定最大休眠时间，默认30秒，multiplier指定乘数，即下一次休眠时间为当前休眠时间*multiplier

ExponentialRandomBackOffPolicy：随机指数退避策略，引入随机乘数可以实现随机乘数回退

2、Guava retry重试框架

guava retryer工具与spring-retry类似，都是通过定义重试者角色来包装正常逻辑重试，但是Guava retryer有更优的策略定义，在支持重试次数和重试频度控制基础上，能够兼容支持多个异常或者自定义实体对象的重试源定义，让重试功能有更多的灵活性。

3、Spring Cloud 重试配置

Spring Cloud Netflix 提供了各种HTTP请求的方式。

你可以使用负载均衡的RestTemplate, Ribbon, 或者 Feign。

无论你选择如何创建HTTP 请求，都存在请求失败的可能性。

当一个请求失败时，你可能想它自动地去重试。

当使用Sping Cloud Netflix这么做，你需要在应用的classpath引入Spring Retry。

当存在Spring Retry，负载均衡的RestTemplates, Feign, 和 Zuul，会自动地重试失败的请求

RestTemplate+Ribbon全局设置：

spring:
  cloud:
   loadbalancer:
      retry:
        enabled: true
ribbon:
    ReadTimeout: 6000
    ConnectTimeout: 6000
    MaxAutoRetries: 1
    MaxAutoRetriesNextServer: 2
       OkToRetryOnAllOperations: true

指定服务service1配置

service1:
  ribbon:
    MaxAutoRetries: 1
    MaxAutoRetriesNextServer: 2
    ConnectTimeout: 5000
    ReadTimeout: 2000
    OkToRetryOnAllOperations: true

配置	说明
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds	断路器的超时时间需要大于ribbon的超时时间，不然不会触发重试。
hello-service.ribbon.ConnectTimeout	请求连接的超时时间
hello-service.ribbon.ReadTimeout	请求处理的超时时间
hello-service.ribbon.OkToRetryOnAllOperations	是否对所有操作请求都进行重试
hello-service.ribbon.MaxAutoRetriesNextServer	重试负载均衡其他的实例最大重试次数，不包括首次server
hello-service.ribbon.MaxAutoRetries	同一台实例最大重试次数，不包括首次调用

feign重试完整配置yml

eureka:
  client:
    serviceUrl:
      defaultZone: http://localhost:8761/eureka/
server:
  port: 7001
spring:
  application:
    name: feign-service

feign:
  hystrix:
    enabled: true


client1:
  ribbon:
    #配置首台服务器重试1次
    MaxAutoRetries: 1
    #配置其他服务器重试两次
    MaxAutoRetriesNextServer: 2
    #链接超时时间
    ConnectTimeout: 500
    #请求处理时间
    ReadTimeout: 2000
    #每个操作都开启重试机制
    OkToRetryOnAllOperations: true

#配置断路器超时时间，默认是1000（1秒）
hystrix:
  command:
    default:
      execution:
        isolation:
          thread:
            timeoutInMilliseconds: 2001