Reliable Connection to AIDL IPC Bound Service in Android

The following statement opens the official Services documentation page:

A Service is an application component that can perform long-running operations in the background, and it does not provide a user interface

There are two mechanisms by which application’s logic can interact with Service: start it, or bind it. In this post we are only concerned with bound Services. The official documentation on Bound Services starts with the following statement:

A bound service is the server in a client-server interface. It allows components (such as activities) to bind to the service, send requests, receive responses, and perform interprocess communication (IPC)

We are going to deep dive into bound IPC Services (i.e. Services that run in different process), and discuss how exactly we can establish a reliable connection to Services in another processes.

Android Service Lifecycle

Android Service’s life-cycle is tricky – it can be started, bound, or both. Furthermore, Service’s life-cycle changes depending on the value it returns from its onStartCommand() method, and depending on the flags used in bindService() call.

Official documentation on bound services provides very basic information which isn’t sufficient for developers to implement reliable services, and even contains some misleading statements (as will be discussed later in this post). This post attempts to provide this crucial piece of information by looking at the connection to bound IPC Services from the client’s point of view.

Bound Service Connection’s Lifecycle vs Service Lifecycle

From client’s point of view, the “effective” lifecycle of an IPC Service is shorter than the actual one – it starts when the client executes bindService() call, and ends when the client calls unbindService(). Client shouldn’t be aware whether the Service existed before the client attempted to bind to it, or whether the Service is going to die once the client unbinds from it.

Although the “effective” lifecycle (as seen from client’s point of view) is shorter, it does not imply that it is simpler. For instance, bound IPC Services can crash or be killed by the OS (in which case it might, or might not be restarted later, as we’ll discuss shortly). What implications does Service crash have on clients that are bound to it? Turns out that clients can and should handle this situation gracefully, and that service crash, as seen from client’s point of view, might be either recoverable or irrecoverable error, depending on the circumstances.

Lifecycle of a Connection to a Bound IPC Service

The lifecycle of a connection to a bound IPC Service (i.e. the “effective” lifecycle of the Service from the client’s point of view) is summarized in the following diagram:

In order to make the following discussions clear, let’s define three types of errors which can occur while the client is being connected to an IPC Service:

Recoverable error: an error from which the client can recover while the Service is bound (without making a rebind attempt).
Irrecoverable error: an error from which the client can’t recover while the Service is bound – in order to handle irrecoverable error the client will need to attempt to rebind the Service (unbind and bind again).
Fatal error: an error after which the client should assume that the Service won’t be available.

Looking at the above diagram, we can see “paths” having three different colors. The color codes mean:

Green: this “path” corresponds to normal flow – the client binds to Serivce, uses it and unbinds while either not encountering any errors along the way, or handling the encountered errors correctly.
Orange: this “path” corresponds to sub-flow of recoverable errors – the connection to Service had been lost at some point, but was restored later.
Red: this “path” corresponds to sub-flow of fatal error – either the service couldn’t be bound, or the connection was lost and never restored, and the client couldn’t handle this irrecoverable error correctly.

It is also important to note, that connection’s lifecycle can be “terminated” in either of three states:

STATE_UNBOUND: this would be a terminating state in case the client calls unbindService() by itself.
STATE_BINDING_FAILED: this would be a terminating state in case the client couldn’t bind to IPC Service at all. This is a fatal error, and IMHO there is no reason to attempt to rebind the Service again.
STATE_BOUND_DISCONNECTED: this would be a terminating state in case an irrecoverable error took place, but client wasn’t designed to handle it correctly – it remains stuck in this state indefinitely long, or until client’s lifecycle callback (e.g. onStop()) invoked.

Connection’s Lifecycle Handling

So far, we saw that in addition to a non-trivial lifecycle of a bound IPC Service, there is also non-trivial lifecycle of client’s connection to it, which makes the task of writing a reliable client challenging.

Since the states shown in the above diagram are abstract (in a sense that framework does not expose them as constants or enums), I implemented IpcServiceConnector class which wraps the connection, derives its state, and then exposes this information to the outside world as state constants. Actually it does a bit more than that, as summarized in its javadoc.

Note that IpcServiceConnector was designed to be used from a background worker threads rather than main UI thread because some of its methods are blocking.

The method that will allow us to implement reliable clients is waitForState(). By invoking this method, you can make a thread wait until connection transitions to a specific state:

    /**
     * Call to this method will block the calling thread until this connector transitions to the
     * specified state, or until the specified amount of time passes If the connector is already in
     * the requested state then this method returns immediately.

     *
     * NOTE: {@link ServiceConnection#onServiceConnected(ComponentName, IBinder)} and
     * {@link ServiceConnection#onServiceDisconnected(ComponentName)} will be invoked BEFORE
     * threads which are waiting due to calls to this method are unblocked. This allows you to
     * use ServiceConnection's callbacks in order perform the required setup before the execution
     * of the blocked threads continues.

     *
     * This method MUST NOT be called from UI thread.
     * @param targetState IpcServiceConnector's state in which the calling thread should be
     *                    unblocked. Should be either of:
     *                    {@link #STATE_NONE}
     *                    {@link #STATE_BOUND_WAITING_FOR_CONNECTION}
     *                    {@link #STATE_BOUND_CONNECTED}
     *                    {@link #STATE_BOUND_DISCONNECTED}
     *                    {@link #STATE_UNBOUND}
     *                    {@link #STATE_BINDING_FAILED}
     *
     * @param blockingTimeout the period of time (in milliseconds) after which the calling thread will
     *                        be unblocked (regardless of the state of this IpcServiceConnector)
     * @return true if target state was reached; false otherwise
     */
    @WorkerThread
    public boolean waitForState(int targetState, int blockingTimeout) {
        ...
    }

In the tutorial application (source here) the client is Activity that should display accurate date and time on the screen (I chose Activity instead of Service in order to simplify the code; not sure that it worked that way though). The date is provided by a Service which is exposed through AIDL and running in a separate child process (this is the closest we can get to simulating a real IPC Service in a single application). Except for fetching from the Service an accurate date and time, the client can also make the Service crash by calling a special pre-defined method. This functionality is available to app users on special button click:

IPC service connector tutorial screenshot

Client’s code is heavily commented, so I’ll post it here for a quick reference:

public class MainActivity extends AppCompatActivity {

    private static final String TAG = "MainActivity";

    private static final int CONNECTION_TIMEOUT = 5000; // ms

    private static final long DATE_REFRESH_INTERVAL = 100; // ms

    private final ServiceConnection mServiceConnection = new ServiceConnection() {

        @Override
        public void onServiceConnected(ComponentName name, IBinder service) {
            Log.d(TAG, "onServiceConnected()");
            mDateProvider = IDateProvider.Stub.asInterface(service);
            mBtnCrashService.setEnabled(true);
        }

        @Override
        public void onServiceDisconnected(ComponentName name) {
            Log.d(TAG, "onServiceDisconnected()");
            mBtnCrashService.setEnabled(false);
            mDateProvider = null;
        }
    };

    private IpcServiceConnector mIpcServiceConnector;
    private IDateProvider mDateProvider;

    private final DateMonitor mDateMonitor = new DateMonitor();

    private TextView mTxtDate;
    private Button mBtnCrashService;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        mIpcServiceConnector = new IpcServiceConnector(this, "DateProviderConnector");

        mTxtDate = (TextView) findViewById(R.id.txt_date);
        mBtnCrashService = (Button) findViewById(R.id.btn_crash_service);

        mBtnCrashService.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
                try {
                    mDateProvider.crashService();
                } catch (RemoteException e) {
                    e.printStackTrace();
                }
            }
        });
    }

    @Override
    protected void onStart() {
        super.onStart();
        Log.d(TAG, "onStart(); binding and connecting to IPC service");
        if (!bindDateProviderService()) {
            // service couldn't be bound - handle this error by disabling the logic which depends
            // on this service (in this case we will do it in onResume())
        }
    }

    @Override
    protected void onStop() {
        super.onStop();
        Log.d(TAG, "onStop(); unbinding IPC service");
        mIpcServiceConnector.unbindIpcService();
    }

    @Override
    protected void onResume() {
        super.onResume();
        if (mIpcServiceConnector.isServiceBound()) {
            Log.d(TAG, "onResume(); starting date monitor");
            mDateMonitor.start();
        } else {
            Log.e(TAG, "onResume(); IPC service is not bound - aborting date monitoring");
        }
    }

    @Override
    protected void onPause() {
        super.onPause();
        Log.d(TAG, "onPause(); stopping date monitor");
        mDateMonitor.stop();
    }

    private boolean bindDateProviderService() {
        return mIpcServiceConnector.bindAndConnectToIpcService(
                new Intent(this, DateProviderService.class),
                mServiceConnection,
                Context.BIND_AUTO_CREATE);
    }

    private class DateMonitor {

        private final Runnable mConnectionInProgressNotification = new Runnable() {
            @Override
            public void run() {
                mTxtDate.setText("connecting to IPC service...");
            }
        };

        private final Runnable mDateNotification = new Runnable() {
            @Override
            public void run() {
                mTxtDate.setText(mCurrentDate);
            }
        };

        private String mCurrentDate = "-";
        private boolean mConnectionFailure = false;

        private final Handler mMainHandler = new Handler(Looper.getMainLooper());

        private Thread mWorkerThread;
        private Thread mOldWorkerThread;

        public void start() {

            // make sure we stop the worker thread, but keep reference to it
            if (mWorkerThread != null) {
                mOldWorkerThread = mWorkerThread;
                stop();
            }

            mWorkerThread = new Thread(new Runnable() {
                @Override
                public void run() {

                    // make this thread wait until the old thread dies
                    if (mOldWorkerThread != null) {
                        try {
                            mOldWorkerThread.join();
                        } catch (InterruptedException e) {
                            // set the interrupted status back (it was cleared in join())
                            Thread.currentThread().interrupt();
                        }
                        mOldWorkerThread = null;
                    }

                    mConnectionFailure = false;
                    mCurrentDate = "-";

                    // loop until interrupted
                    while (!Thread.currentThread().isInterrupted()) {

                        updateDate();

                        try {
                            Thread.sleep(DATE_REFRESH_INTERVAL);
                        } catch (InterruptedException e) {
                            // set the interrupted status back (it was cleared in sleep())
                            Thread.currentThread().interrupt();
                        }
                    }
                }
            });
            mWorkerThread.start();
        }

        public void stop() {
            if (mWorkerThread != null) {
                mWorkerThread.interrupt();
                mWorkerThread = null;
            }
        }

        @WorkerThread
        private void updateDate() {

            /*
             We don't want the date displayed being stuck if we ever need to wait for connection,
             therefore we show informative notification.
             The notification should be cancelled if the service is connected
             */
            if (!mConnectionFailure) {
                mMainHandler.postDelayed(mConnectionInProgressNotification, 100);
            }

            // this call can block the worker thread for up to CONNECTION_TIMEOUT milliseconds
            if (mIpcServiceConnector.waitForState(IpcServiceConnector.STATE_BOUND_CONNECTED,
                    CONNECTION_TIMEOUT)) { // IPC service connected

                mConnectionFailure = false;
                mMainHandler.removeCallbacks(mConnectionInProgressNotification);

                try {
                    mCurrentDate = mDateProvider.getDate();
                } catch (RemoteException e) {
                    // this exception can still be thrown (e.g. service crashed, but the system hasn't
                    // notified us yet)
                    mCurrentDate = "-";
                    e.printStackTrace();
                } catch (NullPointerException e) {
                    /*
                     Since mDateProvider is assigned/cleared on UI thread, but is being used on
                     worker thread, there is a chance of race condition that will result in NPE.
                     We could either add synchronization, or catch NPE - I chose the latter in
                     order to simplify the (already complex) example
                     */
                    mCurrentDate = "-";
                    e.printStackTrace();
                }
            } else { // could not connect to IPC service
                Log.e(TAG, "connection attempt timed out - attempting to rebind to the service");
                notifyUserConnectionAttemptFailed();

                /*
                 Connection error handling here. I just attempt to rebind to the service, but a real
                 error handling could also employ some extrapolation of cached data, etc.
                 If this is a fatal error from your application's point ov view, then unbind from
                 the service and stop the worker thread.
                  */

                mConnectionFailure = true;

                mIpcServiceConnector.unbindIpcService();
                if (!bindDateProviderService()) {
                    Log.e(TAG, "rebind attempt failed - stopping DateMonitor completely");
                    DateMonitor.this.stop();
                }

                return;
            }

            mMainHandler.post(mDateNotification);
        }

        private void notifyUserConnectionAttemptFailed() {
            MainActivity.this.runOnUiThread(new Runnable() {
                @Override
                public void run() {
                    Toast.makeText(
                            MainActivity.this,
                            "connection attempt timed out - rebinding",
                            Toast.LENGTH_LONG)
                            .show();
                }
            });
        }
    }
}

The main part of the code is an inner class DateMonitor. This class spawns a new thread, makes sure that IPC Service is connected, queries the Service for a data and handles all the errors (except for a single fatal error which would occur if initial binding fails).

You can download the source of a tutorial application, install it and play around. This will allow you to discover the system’s policy concerning bound, but crashing IPC Services. You can also use DDMS in order to kill the child process in which the Service runs, thus simulating the Service being killed by the OS. You might be surprised to find out that the system treats killed and crashed Services very differently.

Shell users can kill the child process with a command similar to this one:

 adb shell ps | grep childProcess | perl -ane 'END {print @F[1]}' | xargs adb shell kill

Conclusion

In this post we saw that, though far from trivial, it is possible to implement clients that establish reliable, crash- and kill-tolerant connections to IPC Services. This becomes possible once we “distill” the states of the connections and understand how various transitions between these states should be handled. We also discovered that the system treats crashed and killed bound IPC Services differently.

Please leave your comments and questions below, and consider subscribing to my newsletter if you liked the post.

5 comments on "Reliable Connection to AIDL IPC Bound Service in Android"

Fernando

May 19, 2017 at 2:31 pm

Thank you for your insight, I am developing an app with a Service in it’s own process for the first time and wondered how to handle service crashes. Will do my own tests based on your work.
- Vasiliy
  
  May 19, 2017 at 2:41 pm
  
  Hello Fernando.
  Thank you for your comment – it is a first sign that this post was actually helpful to anybody 🙂
  
  I didn’t test this code on post-M devices, therefore it would be great if you share your perspective when you’re done.
  
  Also, please don’t hesitate to contact me directly on any issue you might encounter (techyourchance@gmail.com).
Alexey Romanov

July 18, 2017 at 10:21 am

You say in the beginning “Official documentation on bound services … contains plainly wrong statements (as will be discussed later in this post).” However, I couldn’t find what this refers to even after rereading the post.
- Vasiliy
  
  July 18, 2017 at 2:09 pm
  
  You’re right – this statement is not being clarified in the rest of the article. Furthermore, I read the documentation again, and it looks like it was updated to a more consistent state.
  
  If I remember correctly, there were two aspects that I referred to as “plainly wrong statements”:
  
  1) The documentation defined a “bound” state incorrectly. You can still see the artifact of this incorrect definition in the example code at the end: variable mBound is being set to true in onServiceConnected callback, and to false in onServiceDisconnected. The actual “bound” state starts when bindService call returns true, and ends when unbindService is called. The service can remain in “bound” state while the connection undergoes multiple connected/disconnected transitions (this is demonstrated in the tutorial app).
  
  2) The documentation stated that onServiceConnected method will be called in response to a successful bindService call. This is not exactly correct – it will be called each time a connection to the bound service is established. If the service behaves well, then it will indeed be called just once after bindService, but if the service crashes or being killed by OS, it might be called multiple times (when the service is restored and reconnected).
  
  In any case, your question caused me to read this post again, and I found both the explanations and the code to be unclear. I guess I need to find a time to improve them.
  
  Thanks for bringing this up.
  - Alexey Romanov
    
    July 19, 2017 at 12:40 pm
    
    Thank you for the answer.