ANR 原理简要分析详解手机开发

最近碰到ANR的问题,需要分析定位。可是ANR的问题是真的难受,有时候即使手握着trace.txt日志也无法看到端倪,因为一般ANR问题出现都伴随着高CPU、高内存占用,确实难以定位。

花了一些时间学习Android ANR 问题的引发和系统如何检测ANR问题,以下做个记录,方便以后追溯,好记性不如烂笔头。

1. ANR问题简介

ANR(App Not Respond)表示程序在一定时间内没有反应。

根本原因就是ui线程长时间无法处理消息或者处理消息时间过长。

2. 常见的ANR问题

主要分成三类

  • InputEvent输入事件: 5s
  • Service服务: 前台服务20s,后台服务200s
  • Broadcast: 前台队列10s,后台队列20s

3. Service 如何检测 ANR 问题

Service的监测ANR是利用定时消息处理的。

在学习Service的启动流程之后你应该知道,AMS是作为一个分发任务的角色,真正处理启动Service的是ActiveServices。

ActiveServices有一个scheduleServiceTimeoutLocked方法,当创建service时候会被调用。

// ActiveService 
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    
    if (proc.executingServices.size() == 0 || proc.thread == null) {
    
        return; 
    } 
    Message msg = mAm.mHandler.obtainMessage( 
            ActivityManagerService.SERVICE_TIMEOUT_MSG); 
    msg.obj = proc; 
    mAm.mHandler.sendMessageDelayed(msg, 
            proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); 
} 

在启动了service以后会不停发一个延迟消息ActivityManagerService.SERVICE_TIMEOUT_MSG

上述代码中,有两个变量:

mAm就是AcivityManagerService,他是由SystemServer启动,运行在独立线程。

mHanlder就是AcivityManagerService的Hanlder,并不运行在AMS的线程中,而是运行在
AMS启动的HandlerThread(名字是MainHandler)。

根据proc.execServiceFg 判断是前台服务还是后台服务,决定延迟时间

// ActiveServices 
// 前台服务的ANR时间 
static final int SERVICE_TIMEOUT = 20*1000; 
// 后台服务的ANR时间 
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10; 

我们看AMS如何处理ActivityManagerService.SERVICE_TIMEOUT_MSG消息

// ActiveServices 
void serviceTimeout(ProcessRecord proc) {
    
    String anrMessage = null; 
    synchronized(mAm) {
    
        // 当前进程没有运行需要检测的services 
        if (proc.executingServices.size() == 0 || proc.thread == null) {
    
            return; 
        } 
        final long now = SystemClock.uptimeMillis(); 
        final long maxTime =  now - 
                (proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); 
        ServiceRecord timeout = null; 
        long nextTime = 0; 
        for (int i=proc.executingServices.size()-1; i>=0; i--) {
    
            ServiceRecord sr = proc.executingServices.valueAt(i); 
            // 遍历service并计算service是否超过anr时间 
            if (sr.executingStart < maxTime) {
    
                // 找到了一个anr的service 
                timeout = sr; 
                break; 
            } 
            if (sr.executingStart > nextTime) {
    
                nextTime = sr.executingStart; 
            } 
        } 
        if (timeout != null && mAm.mLruProcesses.contains(proc)) {
    
            Slog.w(TAG, "Timeout executing service: " + timeout); 
            StringWriter sw = new StringWriter(); 
            PrintWriter pw = new FastPrintWriter(sw, false, 1024); 
            pw.println(timeout); 
            // dump出当前的service的信息,记录在sw中 
            timeout.dump(paw, "    "); 
            pw.close(); 
            mLastAnrDump = sw.toString(); 
            mAm.mHandler.removeCallbacks(mLastAnrDumpClearer); 
            mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS); 
            anrMessage = "executing service " + timeout.shortName; 
        } else {
    
            // 没有发生anr的service,继续发送 ActivityManagerService.SERVICE_TIMEOUT_MSG延迟消息 
            Message msg = mAm.mHandler.obtainMessage( 
                    ActivityManagerService.SERVICE_TIMEOUT_MSG); 
            msg.obj = proc; 
            mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg 
                    ? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT)); 
        } 
    } 
    if (anrMessage != null) {
    
        // 发生了anr,交给了AppErrors处理 
        mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage); 
    } 
} 

小结

Service的检测ANR是利用AMS的HandlerThread间隔时间发送消息、定时处理消息,如果在消息处理过程中,检测到Service处理时间超过限制就说明该Service已经ANR,记录下相关信息并交由AppError负责处理

4. InputEvent 如何检测 ANR 问题

Android的输入事件都是在native层的InputDispatcher的分发处理,包括输入事件的anr。

参考《ANR机制以及问题分析》
InputDispatcherThread是一个线程,它处理一次消息的派发
输入事件作为一个消息,需要排队等待派发,每一个Connection都维护两个队列:
* outboundQueue: 等待发送给窗口的事件。每一个新消息到来,都会先进入到此队列
* waitQueue: 已经发送给窗口的事件
publishKeyEvent完成后,表示事件已经派发了,就将事件从outboundQueue挪到了waitQueue

从java层面来说,native层发生了input的anr会调用InputManagerService的notifyANR()方法

// Native callback. 
private long notifyANR(InputApplicationHandle inputApplicationHandle, 
        InputWindowHandle inputWindowHandle, String reason) {
    
    return mWindowManagerCallbacks.notifyANR( 
            inputApplicationHandle, inputWindowHandle, reason); 
} 

其中mWindowManagerCallbacks就是inputMonitor,notifyANR()方法被调用。

// InputMonitor 
/* Notifies the window manager about an application that is not responding. 
 * Returns a new timeout to continue waiting in nanoseconds, or 0 to abort dispatch. 
 * Called by the InputManager. 
 */ 
@Override 
public long notifyANR(InputApplicationHandle inputApplicationHandle, 
        InputWindowHandle inputWindowHandle, String reason) {
    
    AppWindowToken appWindowToken = null; 
    WindowState windowState = null; 
    ... 
    if (appWindowToken != null && appWindowToken.appToken != null) {
    
        ... 
        // 当前activity存在 
        final boolean abort = controller != null 
                && controller.keyDispatchingTimedOut(reason, 
                        (windowState != null) ? windowState.mSession.mPid : -1); 
        if (!abort) {
    
            return appWindowToken.mInputDispatchingTimeoutNanos; 
        } 
    } else if (windowState != null) {
    
        try {
    
            long timeout = ActivityManager.getService().inputDispatchingTimedOut( 
                    windowState.mSession.mPid, aboveSystem, reason); 
            if (timeout >= 0) {
    
                return timeout * 1000000L; // nanoseconds 
            } 
        } catch (RemoteException ex) {
    
        } 
    } 
    return 0;  
} 

上述有两种ANR情况

  • keyDispatchingTimedOut:当前Activity存在,交由ActivityRecord处理
  • inputDispatchingTimedOut:当前Activity不存在,交由AMS处理

先看keyDispatchingTimedOut

// ActivityRecord 
@Override 
public boolean keyDispatchingTimedOut(String reason, int windowPid) {
    
    ActivityRecord anrActivity; 
    ProcessRecord anrApp; 
    boolean windowFromSameProcessAsActivity; 
    synchronized (service) {
    
        anrActivity = getWaitingHistoryRecordLocked(); 
        anrApp = app; 
        windowFromSameProcessAsActivity = 
                app == null || app.pid == windowPid || windowPid == -1; 
    } 
    if (windowFromSameProcessAsActivity) {
    
        return service.inputDispatchingTimedOut(anrApp, anrActivity, this, false, reason); 
    } else {
    
        return service.inputDispatchingTimedOut(windowPid, false /* aboveSystem */, reason) < 0; 
    } 
} 

最后还是统一走到了AMS的inputDispatchingTimedOut()

 public boolean inputDispatchingTimedOut(final ProcessRecord proc, 
            final ActivityRecord activity, final ActivityRecord parent, 
            final boolean aboveSystem, String reason) {
    
        if (checkCallingPermission(android.Manifest.permission.FILTER_EVENTS) 
                != PackageManager.PERMISSION_GRANTED) {
    
            throw new SecurityException("Requires permission " 
                    + android.Manifest.permission.FILTER_EVENTS); 
        } 
 
        final String annotation; 
        if (reason == null) {
    
            annotation = "Input dispatching timed out"; 
        } else {
    
            annotation = "Input dispatching timed out (" + reason + ")"; 
        } 
 
        if (proc != null) {
    
            synchronized (this) {
    
                if (proc.debugging) {
    
                    return false; 
                } 
 
                if (proc.instr != null) {
    
                    Bundle info = new Bundle(); 
                    info.putString("shortMsg", "keyDispatchingTimedOut"); 
                    info.putString("longMsg", annotation); 
                    finishInstrumentationLocked(proc, Activity.RESULT_CANCELED, info); 
                    return true; 
                } 
            } 
            mHandler.post(new Runnable() {
    
                @Override 
                public void run() {
    
                    mAppErrors.appNotResponding(proc, activity, parent, aboveSystem, annotation); 
                } 
            }); 
        } 
 
        return true; 
    } 

最后还是交由AppError进行处理

小结

InputEvent的ANR检测逻辑是在Native层的InputDispatcher,当检测发生anr时候调用了java层的InputManagerService的notifyANR()方法,最后还是由AppError收集信息,弹窗等

5. Broadcast ANR 问题

Broadcast发生ANR是因为在onReceive中处理的时间过长
跟ActiveService一样,负责真正管理广播的是BroadcastQueue

广播的大致流程:

在App进程创建时候,AMS会调用sendPendingBroadcastsLocked(app),
sendPendingBroadcastsLocked()会调用processCurBroadcastLocked(),
通过app.thread.scheduleReceiver(),发送到用户进程,完成广播流程

ANR检测

在processCurBroadcastLocked()中会调用 setBroadcastTimeoutLocked(timeoutTime);

final void setBroadcastTimeoutLocked(long timeoutTime) {
    
    if (! mPendingBroadcastTimeoutMessage) {
    
        Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this); 
        mHandler.sendMessageAtTime(msg, timeoutTime); 
        mPendingBroadcastTimeoutMessage = true; 
    } 
} 

往mHandler发送一个BROADCAST_TIMEOUT_MSG,其实基本机制跟Service检测ANR差不多。

其中mHandler的消息处理也是在AMS创建的的HandlerThread中,跟Service的ActivityManagerService.SERVICE_TIMEOUT_MSG在同一线程中。

来看mHanlder如何处理BROADCAST_TIMEOUT_MSG的消息。

// BroadcastQueue.java 
public void handleMessage(Message msg) {
    
    switch (msg.what) {
    
        case BROADCAST_INTENT_MSG: {
    
            if (DEBUG_BROADCAST) Slog.v( 
                    TAG_BROADCAST, "Received BROADCAST_INTENT_MSG"); 
            processNextBroadcast(true); 
        } break; 
        case BROADCAST_TIMEOUT_MSG: {
    
            synchronized (mService) {
    
                broadcastTimeoutLocked(true); 
            } 
        } break; 
    } 
} 

最后是Broadcast的broadcastTimeoutLocked()

final void broadcastTimeoutLocked(boolean fromMsg) {
    
    ... 
    if (mOrderedBroadcasts.size() == 0) {
    
        return; 
    } 
    long now = SystemClock.uptimeMillis(); 
    BroadcastRecord r = mOrderedBroadcasts.get(0); 
    if (fromMsg) {
    
        ... 
        long timeoutTime = r.receiverTime + mTimeoutPeriod; 
        // 没有超时,就继续发送延迟消息 
        if (timeoutTime > now) {
    
            setBroadcastTimeoutLocked(timeoutTime); 
            return; 
        } 
    } 
    ... 
    // 到这说明出现ANR,开始收集进程信息、广播信息 
    r.receiverTime = now; 
    if (!debugging) {
    
        r.anrCount++; 
    } 
    ProcessRecord app = null; 
    String anrMessage = null; 
    Object curReceiver; 
    if (r.nextReceiver > 0) {
    
        curReceiver = r.receivers.get(r.nextReceiver-1); 
        r.delivery[r.nextReceiver-1] = BroadcastRecord.DELIVERY_TIMEOUT; 
    } else {
    
        curReceiver = r.curReceiver; 
    } 
    Slog.w(TAG, "Receiver during timeout of " + r + " : " + curReceiver); 
    logBroadcastReceiverDiscardLocked(r); 
    if (curReceiver != null && curReceiver instanceof BroadcastFilter) {
    
        BroadcastFilter bf = (BroadcastFilter)curReceiver; 
        if (bf.receiverList.pid != 0 
                && bf.receiverList.pid != ActivityManagerService.MY_PID) {
    
            synchronized (mService.mPidsSelfLocked) {
    
                app = mService.mPidsSelfLocked.get( 
                        bf.receiverList.pid); 
            } 
        } 
    } else {
    
        app = r.curApp; 
    } 
    if (app != null) {
    
        anrMessage = "Broadcast of " + r.intent.toString(); 
    } 
    if (mPendingBroadcast == r) {
    
        mPendingBroadcast = null; 
    } 
    // 结束当前的receiver,继续处理下一个receiver 
    finishReceiverLocked(r, r.resultCode, r.resultData, 
            r.resultExtras, r.resultAbort, false); 
    scheduleBroadcastsLocked(); 
    if (!debugging && anrMessage != null) {
    
        // 处理anr信息 
        mHandler.post(new AppNotResponding(app, anrMessage)); 
    } 
} 
private final class AppNotResponding implements Runnable {
    
    ... 
    @Override 
    public void run() {
    
        mService.mAppErrors.appNotResponding(mApp, null, null, false, mAnnotation); 
    } 

跟service处理ANR的情况一样,最后交由AMS中的AppError去处理收集anr信息、弹窗等问题。

小结

broadcast检测anr的机制基本跟service一致,利用handler发送间隔消息,在AMS的handlerThread中检测是否anr,最后也是交由appError处理

定位问题

大概了解了ANR的几种情况是如何产生的,如何定位到问题呢?

dumpsys

  1. 采集当前cpu占用情况:
    adb shell dumpsys cpuinfo

  2. 采集当前memory占用情况:
    adb shell dumpsys memoryinfo

  3. 采集当前activity、service的情况:
    adb shell dumpsys activity
    adb shell service list

利用dumpsys对整体环境有个大致了解

logcat

检索am_anr关键字和anr关键字,如果你幸运的话是可以找到对应的anr进程
(我遇到好几次,界面显示anr可是却没有anr的logcat信息)

traces.txt

data/anr/traces.txt
一般来说,发生anr的进程会出现traces日志的最前面,会有线程的所有信息,包括堆栈信息。

借助bugly等平台

一般anr是因为我们的程序不够健壮,借助bugly可以让我们比较快速的定位到问题,但是也不是万能的。

感谢

非常感谢这篇文章的作者,让我对ANR有一个整体的学习

https://duanqz.github.io/2015-10-12-ANR-Analysis#service

原创文章,作者:奋斗,如若转载,请注明出处:https://blog.ytso.com/tech/app/6265.html

(0)
上一篇 2021年7月17日 00:45
下一篇 2021年7月17日 00:45

相关推荐

发表回复

登录后才能评论