ANR 原理简要分析详解手机开发

最近碰到ANR的问题,需要分析定位。可是ANR的问题是真的难受,有时候即使手握着trace.txt日志也无法看到端倪,因为一般ANR问题出现都伴随着高CPU、高内存占用,确实难以定位。

花了一些时间学习Android ANR 问题的引发和系统如何检测ANR问题,以下做个记录,方便以后追溯,好记性不如烂笔头。

1. ANR问题简介

ANR(App Not Respond)表示程序在一定时间内没有反应。

根本原因就是ui线程长时间无法处理消息或者处理消息时间过长。

2. 常见的ANR问题

主要分成三类

  • InputEvent输入事件: 5s
  • Service服务: 前台服务20s,后台服务200s
  • Broadcast: 前台队列10s,后台队列20s

3. Service 如何检测 ANR 问题

Service的监测ANR是利用定时消息处理的。

在学习Service的启动流程之后你应该知道,AMS是作为一个分发任务的角色,真正处理启动Service的是ActiveServices。

ActiveServices有一个scheduleServiceTimeoutLocked方法,当创建service时候会被调用。

// ActiveService 
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    
    if (proc.executingServices.size() == 0 || proc.thread == null) {
    
        return; 
    } 
    Message msg = mAm.mHandler.obtainMessage( 
            ActivityManagerService.SERVICE_TIMEOUT_MSG); 
    msg.obj = proc; 
    mAm.mHandler.sendMessageDelayed(msg, 
            proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); 
} 

在启动了service以后会不停发一个延迟消息ActivityManagerService.SERVICE_TIMEOUT_MSG

上述代码中,有两个变量:

mAm就是AcivityManagerService,他是由SystemServer启动,运行在独立线程。

mHanlder就是AcivityManagerService的Hanlder,并不运行在AMS的线程中,而是运行在
AMS启动的HandlerThread(名字是MainHandler)。

根据proc.execServiceFg 判断是前台服务还是后台服务,决定延迟时间

// ActiveServices 
// 前台服务的ANR时间 
static final int SERVICE_TIMEOUT = 20*1000; 
// 后台服务的ANR时间 
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10; 

我们看AMS如何处理ActivityManagerService.SERVICE_TIMEOUT_MSG消息

// ActiveServices 
void serviceTimeout(ProcessRecord proc) {
 
String anrMessage = null; 
synchronized(mAm) {
 
// 当前进程没有运行需要检测的services 
if (proc.executingServices.size() == 0 || proc.thread == null) {
 
return; 
} 
final long now = SystemClock.uptimeMillis(); 
final long maxTime =  now - 
(proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); 
ServiceRecord timeout = null; 
long nextTime = 0; 
for (int i=proc.executingServices.size()-1; i>=0; i--) {
 
ServiceRecord sr = proc.executingServices.valueAt(i); 
// 遍历service并计算service是否超过anr时间 
if (sr.executingStart < maxTime) {
 
// 找到了一个anr的service 
timeout = sr; 
break; 
} 
if (sr.executingStart > nextTime) {
 
nextTime = sr.executingStart; 
} 
} 
if (timeout != null && mAm.mLruProcesses.contains(proc)) {
 
Slog.w(TAG, "Timeout executing service: " + timeout); 
StringWriter sw = new StringWriter(); 
PrintWriter pw = new FastPrintWriter(sw, false, 1024); 
pw.println(timeout); 
// dump出当前的service的信息,记录在sw中 
timeout.dump(paw, "    "); 
pw.close(); 
mLastAnrDump = sw.toString(); 
mAm.mHandler.removeCallbacks(mLastAnrDumpClearer); 
mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS); 
anrMessage = "executing service " + timeout.shortName; 
} else {
 
// 没有发生anr的service,继续发送 ActivityManagerService.SERVICE_TIMEOUT_MSG延迟消息 
Message msg = mAm.mHandler.obtainMessage( 
ActivityManagerService.SERVICE_TIMEOUT_MSG); 
msg.obj = proc; 
mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg 
? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT)); 
} 
} 
if (anrMessage != null) {
 
// 发生了anr,交给了AppErrors处理 
mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage); 
} 
} 

小结

Service的检测ANR是利用AMS的HandlerThread间隔时间发送消息、定时处理消息,如果在消息处理过程中,检测到Service处理时间超过限制就说明该Service已经ANR,记录下相关信息并交由AppError负责处理

4. InputEvent 如何检测 ANR 问题

Android的输入事件都是在native层的InputDispatcher的分发处理,包括输入事件的anr。

参考《ANR机制以及问题分析》
InputDispatcherThread是一个线程,它处理一次消息的派发
输入事件作为一个消息,需要排队等待派发,每一个Connection都维护两个队列:
* outboundQueue: 等待发送给窗口的事件。每一个新消息到来,都会先进入到此队列
* waitQueue: 已经发送给窗口的事件
publishKeyEvent完成后,表示事件已经派发了,就将事件从outboundQueue挪到了waitQueue

从java层面来说,native层发生了input的anr会调用InputManagerService的notifyANR()方法

// Native callback. 
private long notifyANR(InputApplicationHandle inputApplicationHandle, 
InputWindowHandle inputWindowHandle, String reason) {
 
return mWindowManagerCallbacks.notifyANR( 
inputApplicationHandle, inputWindowHandle, reason); 
} 

其中mWindowManagerCallbacks就是inputMonitor,notifyANR()方法被调用。

// InputMonitor 
/* Notifies the window manager about an application that is not responding. 
* Returns a new timeout to continue waiting in nanoseconds, or 0 to abort dispatch. 
* Called by the InputManager. 
*/ 
@Override 
public long notifyANR(InputApplicationHandle inputApplicationHandle, 
InputWindowHandle inputWindowHandle, String reason) {
 
AppWindowToken appWindowToken = null; 
WindowState windowState = null; 
... 
if (appWindowToken != null && appWindowToken.appToken != null) {
 
... 
// 当前activity存在 
final boolean abort = controller != null 
&& controller.keyDispatchingTimedOut(reason, 
(windowState != null) ? windowState.mSession.mPid : -1); 
if (!abort) {
 
return appWindowToken.mInputDispatchingTimeoutNanos; 
} 
} else if (windowState != null) {
 
try {
 
long timeout = ActivityManager.getService().inputDispatchingTimedOut( 
windowState.mSession.mPid, aboveSystem, reason); 
if (timeout >= 0) {
 
return timeout * 1000000L; // nanoseconds 
} 
} catch (RemoteException ex) {
 
} 
} 
return 0;  
} 

上述有两种ANR情况

  • keyDispatchingTimedOut:当前Activity存在,交由ActivityRecord处理
  • inputDispatchingTimedOut:当前Activity不存在,交由AMS处理

先看keyDispatchingTimedOut

// ActivityRecord 
@Override 
public boolean keyDispatchingTimedOut(String reason, int windowPid) {
 
ActivityRecord anrActivity; 
ProcessRecord anrApp; 
boolean windowFromSameProcessAsActivity; 
synchronized (service) {
 
anrActivity = getWaitingHistoryRecordLocked(); 
anrApp = app; 
windowFromSameProcessAsActivity = 
app == null || app.pid == windowPid || windowPid == -1; 
} 
if (windowFromSameProcessAsActivity) {
 
return service.inputDispatchingTimedOut(anrApp, anrActivity, this, false, reason); 
} else {
 
return service.inputDispatchingTimedOut(windowPid, false /* aboveSystem */, reason) < 0; 
} 
} 

最后还是统一走到了AMS的inputDispatchingTimedOut()

 public boolean inputDispatchingTimedOut(final ProcessRecord proc, 
final ActivityRecord activity, final ActivityRecord parent, 
final boolean aboveSystem, String reason) {
 
if (checkCallingPermission(android.Manifest.permission.FILTER_EVENTS) 
!= PackageManager.PERMISSION_GRANTED) {
 
throw new SecurityException("Requires permission " 
+ android.Manifest.permission.FILTER_EVENTS); 
} 
final String annotation; 
if (reason == null) {
 
annotation = "Input dispatching timed out"; 
} else {
 
annotation = "Input dispatching timed out (" + reason + ")"; 
} 
if (proc != null) {
 
synchronized (this) {
 
if (proc.debugging) {
 
return false; 
} 
if (proc.instr != null) {
 
Bundle info = new Bundle(); 
info.putString("shortMsg", "keyDispatchingTimedOut"); 
info.putString("longMsg", annotation); 
finishInstrumentationLocked(proc, Activity.RESULT_CANCELED, info); 
return true; 
} 
} 
mHandler.post(new Runnable() {
 
@Override 
public void run() {
 
mAppErrors.appNotResponding(proc, activity, parent, aboveSystem, annotation); 
} 
}); 
} 
return true; 
} 

最后还是交由AppError进行处理

小结

InputEvent的ANR检测逻辑是在Native层的InputDispatcher,当检测发生anr时候调用了java层的InputManagerService的notifyANR()方法,最后还是由AppError收集信息,弹窗等

5. Broadcast ANR 问题

Broadcast发生ANR是因为在onReceive中处理的时间过长
跟ActiveService一样,负责真正管理广播的是BroadcastQueue

广播的大致流程:

在App进程创建时候,AMS会调用sendPendingBroadcastsLocked(app),
sendPendingBroadcastsLocked()会调用processCurBroadcastLocked(),
通过app.thread.scheduleReceiver(),发送到用户进程,完成广播流程

ANR检测

在processCurBroadcastLocked()中会调用 setBroadcastTimeoutLocked(timeoutTime);

final void setBroadcastTimeoutLocked(long timeoutTime) {
 
if (! mPendingBroadcastTimeoutMessage) {
 
Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this); 
mHandler.sendMessageAtTime(msg, timeoutTime); 
mPendingBroadcastTimeoutMessage = true; 
} 
} 

往mHandler发送一个BROADCAST_TIMEOUT_MSG,其实基本机制跟Service检测ANR差不多。

其中mHandler的消息处理也是在AMS创建的的HandlerThread中,跟Service的ActivityManagerService.SERVICE_TIMEOUT_MSG在同一线程中。

来看mHanlder如何处理BROADCAST_TIMEOUT_MSG的消息。

// BroadcastQueue.java 
public void handleMessage(Message msg) {
 
switch (msg.what) {
 
case BROADCAST_INTENT_MSG: {
 
if (DEBUG_BROADCAST) Slog.v( 
TAG_BROADCAST, "Received BROADCAST_INTENT_MSG"); 
processNextBroadcast(true); 
} break; 
case BROADCAST_TIMEOUT_MSG: {
 
synchronized (mService) {
 
broadcastTimeoutLocked(true); 
} 
} break; 
} 
} 

最后是Broadcast的broadcastTimeoutLocked()

final void broadcastTimeoutLocked(boolean fromMsg) {
 
... 
if (mOrderedBroadcasts.size() == 0) {
 
return; 
} 
long now = SystemClock.uptimeMillis(); 
BroadcastRecord r = mOrderedBroadcasts.get(0); 
if (fromMsg) {
 
... 
long timeoutTime = r.receiverTime + mTimeoutPeriod; 
// 没有超时,就继续发送延迟消息 
if (timeoutTime > now) {
 
setBroadcastTimeoutLocked(timeoutTime); 
return; 
} 
} 
... 
// 到这说明出现ANR,开始收集进程信息、广播信息 
r.receiverTime = now; 
if (!debugging) {
 
r.anrCount++; 
} 
ProcessRecord app = null; 
String anrMessage = null; 
Object curReceiver; 
if (r.nextReceiver > 0) {
 
curReceiver = r.receivers.get(r.nextReceiver-1); 
r.delivery[r.nextReceiver-1] = BroadcastRecord.DELIVERY_TIMEOUT; 
} else {
 
curReceiver = r.curReceiver; 
} 
Slog.w(TAG, "Receiver during timeout of " + r + " : " + curReceiver); 
logBroadcastReceiverDiscardLocked(r); 
if (curReceiver != null && curReceiver instanceof BroadcastFilter) {
 
BroadcastFilter bf = (BroadcastFilter)curReceiver; 
if (bf.receiverList.pid != 0 
&& bf.receiverList.pid != ActivityManagerService.MY_PID) {
 
synchronized (mService.mPidsSelfLocked) {
 
app = mService.mPidsSelfLocked.get( 
bf.receiverList.pid); 
} 
} 
} else {
 
app = r.curApp; 
} 
if (app != null) {
 
anrMessage = "Broadcast of " + r.intent.toString(); 
} 
if (mPendingBroadcast == r) {
 
mPendingBroadcast = null; 
} 
// 结束当前的receiver,继续处理下一个receiver 
finishReceiverLocked(r, r.resultCode, r.resultData, 
r.resultExtras, r.resultAbort, false); 
scheduleBroadcastsLocked(); 
if (!debugging && anrMessage != null) {
 
// 处理anr信息 
mHandler.post(new AppNotResponding(app, anrMessage)); 
} 
} 
private final class AppNotResponding implements Runnable {
 
... 
@Override 
public void run() {
 
mService.mAppErrors.appNotResponding(mApp, null, null, false, mAnnotation); 
} 

跟service处理ANR的情况一样,最后交由AMS中的AppError去处理收集anr信息、弹窗等问题。

小结

broadcast检测anr的机制基本跟service一致,利用handler发送间隔消息,在AMS的handlerThread中检测是否anr,最后也是交由appError处理

定位问题

大概了解了ANR的几种情况是如何产生的,如何定位到问题呢?

dumpsys

  1. 采集当前cpu占用情况:
    adb shell dumpsys cpuinfo

  2. 采集当前memory占用情况:
    adb shell dumpsys memoryinfo

  3. 采集当前activity、service的情况:
    adb shell dumpsys activity
    adb shell service list

利用dumpsys对整体环境有个大致了解

logcat

检索am_anr关键字和anr关键字,如果你幸运的话是可以找到对应的anr进程
(我遇到好几次,界面显示anr可是却没有anr的logcat信息)

traces.txt

data/anr/traces.txt
一般来说,发生anr的进程会出现traces日志的最前面,会有线程的所有信息,包括堆栈信息。

借助bugly等平台

一般anr是因为我们的程序不够健壮,借助bugly可以让我们比较快速的定位到问题,但是也不是万能的。

感谢

非常感谢这篇文章的作者,让我对ANR有一个整体的学习

https://duanqz.github.io/2015-10-12-ANR-Analysis#service

原创文章,作者:奋斗,如若转载,请注明出处:https://blog.ytso.com/6265.html

(0)
上一篇 2021年7月17日
下一篇 2021年7月17日

相关推荐

发表回复

登录后才能评论