This post describes how to establish a reliable connection to AIDL Service running in a different process in Android.

Android Service:

The following statement opens the official Services documentation page:

A Service is an application component that can perform long-running operations in the background, and it does not provide a user interface

There are two mechanisms by which application’s logic can interact with Service: start it, or bind it. In this post we are only concerned with bound Services. The official documentation on Bound Services starts with the following statement:

A bound service is the server in a client-server interface. It allows components (such as activities) to bind to the service, send requests, receive responses, and perform interprocess communication (IPC)

We are going to deep dive into bound IPC Services (i.e. Services that run in different process), and discuss how exactly we can establish a reliable connection to Services in another processes.

Android Service life-cycle:

Android Service’s life-cycle is tricky – it can be started, bound, or both. Furthermore, Service’s life-cycle changes depending on the value it returns from its onStartCommand() method, and depending on the flags used in bindService() call.

Official documentation on bound services provides very basic information which isn’t sufficient for developers to implement reliable services, and even contains plainly wrong statements (as will be discussed later in this post). While many blogs, tutorials and StackOverflow questions attempt to provide additional information on Services, almost no source discusses how exactly the clients, which bind to Services, perceive Services’ life-cycle, and how to account for that life-cycle on client’s side of connection. This post attempts to provide that crucial piece of information by looking at the connection to bound IPC Services from client’s point of view.

How bound IPC Service connection’s life-cycle is different from life-cycle of the Service itself:

From client’s point of view, the “effective” life-cycle of IPC Service is shorter than the actual one – it starts when the client executes bindService() call, and ends when the client calls unbindService(). Client shouldn’t be aware whether the Service existed before the client attempted to bind to it, or whether the Service is going to die once the client unbinds from it [side note: clients shouldn’t be aware, but sometimes developers force their clients to be aware (which is a bad idea); nothing else could explain the popularity of this StackOverflow answer].

Although the “effective” life-cycle (as seen from client’s point of view) is shorter, it does not imply that it is simpler. For instance, there is widely accepted knowledge that bound IPC Services can crash or be killed by OS (in which case it might, or might not, be restarted by OS – will be discussed later), but almost no source addresses the question of what implications does Service crash have on clients that are bound to it. It turns out (and will be discussed shortly), that clients can and should handle this situation gracefully, and that service crash, as seen from client’s point of view, might be both recoverable and irrecoverable error, depending on circumstances.

What is the exact life-cycle of a connection to bound IPC Service:

The life-cycle of a connection to bound IPC Service (the “effective” life-cycle of the Service from client’s point of view) is summarized in the following diagram:

Connection state diagram

In order to make the following discussions clear, let’s define three types of errors which can occur while the client is being connected to an IPC Service:

  • Recoverable error: an error from which the client can recover while the Service is bound (without making a rebind attempt).
  • Irrecoverable error: an error from which the client can’t recover while the Service is bound – in order to handle irrecoverable error the client will need to attempt to rebind the Service (unbind and bind again).
  • Fatal error: an error after which the client should assume that the Service won’t be available.

Looking at the above diagram, we can see “paths” having three different colors. The color codes mean:

  • Green: this “path” corresponds to normal flow – the client binds to Serivce, uses it and unbinds while either not encountering any errors along the way, or handling the encountered errors correctly.
  • Orange: this “path” corresponds to sub-flow of recoverable errors – the connection to Service had been lost at some point, but was restored later.
  • Red: this “path” corresponds to sub-flow of fatal error – either the service couldn’t be bound, or the connection was lost and never restored, and the client couldn’t handle this irrecoverable error correctly.

It is also important to note, that connection’s life-cycle can be “terminated” in either of three states:

  • STATE_UNBOUND: this would be a terminating state in case the client calls unbindService() by itself.
  • STATE_BINDING_FAILED: this would be a terminating state in case the client couldn’t bind to IPC Service at all. This is a fatal error, and IMHO there is no reason to attempt to rebind the Service again.
  • STATE_BOUND_DISCONNECTED: this would be a terminating state in case an irrecoverable error took place, but client wasn’t designed to handle it correctly – it remains stuck in this state indefinitely long, or until client’s life-cycle callback (e.g. onStop()) invoked.

How can we account for connection’s life-cycle when writing clients of bound IPC Services:

So far we saw that in addition to a non-trivial life-cycle of bound IPC Service, there is also non-trivial life-cycle of client’s connection to it, which makes a task of writing a reliable client a real challenge.

Since the states shown in the above diagram are abstract (in a sense that framework does not expose them as constants or enums), I began by implementing IpcServiceConnector class which wraps the connection, derives its state, and exposes this information to the outside world as state constants. Actually it does a bit more than that, but let me not repeat myself – it is all summed up in javadoc in details.

One important note is that IpcServiceConnector was designed to be used by background worker threads rather than main UI thread – some of its methods would block a calling thread, which is not acceptable in case of UI thread. The rationale was simple – you can always offload work from UI thread to background threads.

The method that will allow us to implement reliable clients is waitForState() (by invoking this method we can make backgrounds thread wait until connection transitions to a specific state):


    /**
     * Call to this method will block the calling thread until this connector transitions to the
     * specified state, or until the specified amount of time passes If the connector is already in
     * the requested state then this method returns immediately.

     *
     * NOTE: {@link ServiceConnection#onServiceConnected(ComponentName, IBinder)} and
     * {@link ServiceConnection#onServiceDisconnected(ComponentName)} will be invoked BEFORE
     * threads which are waiting due to calls to this method are unblocked. This allows you to
     * use ServiceConnection's callbacks in order perform the required setup before the execution
     * of the blocked threads continues.

     *
     * This method MUST NOT be called from UI thread.
     * @param targetState IpcServiceConnector's state in which the calling thread should be
     *                    unblocked. Should be either of:
     *                    {@link #STATE_NONE}
     *                    {@link #STATE_BOUND_WAITING_FOR_CONNECTION}
     *                    {@link #STATE_BOUND_CONNECTED}
     *                    {@link #STATE_BOUND_DISCONNECTED}
     *                    {@link #STATE_UNBOUND}
     *                    {@link #STATE_BINDING_FAILED}
     *
     * @param blockingTimeout the period of time (in milliseconds) after which the calling thread will
     *                        be unblocked (regardless of the state of this IpcServiceConnector)
     * @return true if target state was reached; false otherwise
     */
    @WorkerThread
    public boolean waitForState(int targetState, int blockingTimeout) {
        ...
    }

In the tutorial application (source here) the client is Activity that should display accurate date and time on the screen (I chose Activity instead of Service in order to simplify the code; not sure that it worked that way though). The date is provided by a Service which is exposed through AIDL and running in a separate child process (this is the closest we can get to simulating a real IPC Service in a single application). Except for fetching from the Service an accurate date and time, the client can also make the Service crash by calling a special pre-defined method. This functionality is available to app users on special button click:

IPC service connector tutorial screenshot

Since client’s code is also heavily commented, I will post it here instead of repeating myself:

public class MainActivity extends AppCompatActivity {

    private static final String TAG = "MainActivity";

    private static final int CONNECTION_TIMEOUT = 5000; // ms

    private static final long DATE_REFRESH_INTERVAL = 100; // ms

    private final ServiceConnection mServiceConnection = new ServiceConnection() {

        @Override
        public void onServiceConnected(ComponentName name, IBinder service) {
            Log.d(TAG, "onServiceConnected()");
            mDateProvider = IDateProvider.Stub.asInterface(service);
            mBtnCrashService.setEnabled(true);
        }

        @Override
        public void onServiceDisconnected(ComponentName name) {
            Log.d(TAG, "onServiceDisconnected()");
            mBtnCrashService.setEnabled(false);
            mDateProvider = null;
        }
    };

    private IpcServiceConnector mIpcServiceConnector;
    private IDateProvider mDateProvider;

    private final DateMonitor mDateMonitor = new DateMonitor();

    private TextView mTxtDate;
    private Button mBtnCrashService;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        mIpcServiceConnector = new IpcServiceConnector(this, "DateProviderConnector");

        mTxtDate = (TextView) findViewById(R.id.txt_date);
        mBtnCrashService = (Button) findViewById(R.id.btn_crash_service);

        mBtnCrashService.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
                try {
                    mDateProvider.crashService();
                } catch (RemoteException e) {
                    e.printStackTrace();
                }
            }
        });
    }

    @Override
    protected void onStart() {
        super.onStart();
        Log.d(TAG, "onStart(); binding and connecting to IPC service");
        if (!bindDateProviderService()) {
            // service couldn't be bound - handle this error by disabling the logic which depends
            // on this service (in this case we will do it in onResume())
        }
    }

    @Override
    protected void onStop() {
        super.onStop();
        Log.d(TAG, "onStop(); unbinding IPC service");
        mIpcServiceConnector.unbindIpcService();
    }

    @Override
    protected void onResume() {
        super.onResume();
        if (mIpcServiceConnector.isServiceBound()) {
            Log.d(TAG, "onResume(); starting date monitor");
            mDateMonitor.start();
        } else {
            Log.e(TAG, "onResume(); IPC service is not bound - aborting date monitoring");
        }
    }

    @Override
    protected void onPause() {
        super.onPause();
        Log.d(TAG, "onPause(); stopping date monitor");
        mDateMonitor.stop();
    }

    private boolean bindDateProviderService() {
        return mIpcServiceConnector.bindAndConnectToIpcService(
                new Intent(this, DateProviderService.class),
                mServiceConnection,
                Context.BIND_AUTO_CREATE);
    }


    private class DateMonitor {


        private final Runnable mConnectionInProgressNotification = new Runnable() {
            @Override
            public void run() {
                mTxtDate.setText("connecting to IPC service...");
            }
        };

        private final Runnable mDateNotification = new Runnable() {
            @Override
            public void run() {
                mTxtDate.setText(mCurrentDate);
            }
        };

        private String mCurrentDate = "-";
        private boolean mConnectionFailure = false;

        private final Handler mMainHandler = new Handler(Looper.getMainLooper());

        private Thread mWorkerThread;
        private Thread mOldWorkerThread;

        public void start() {

            // make sure we stop the worker thread, but keep reference to it
            if (mWorkerThread != null) {
                mOldWorkerThread = mWorkerThread;
                stop();
            }

            mWorkerThread = new Thread(new Runnable() {
                @Override
                public void run() {

                    // make this thread wait until the old thread dies
                    if (mOldWorkerThread != null) {
                        try {
                            mOldWorkerThread.join();
                        } catch (InterruptedException e) {
                            // set the interrupted status back (it was cleared in join())
                            Thread.currentThread().interrupt();
                        }
                        mOldWorkerThread = null;
                    }

                    mConnectionFailure = false;
                    mCurrentDate = "-";

                    // loop until interrupted
                    while (!Thread.currentThread().isInterrupted()) {

                        updateDate();

                        try {
                            Thread.sleep(DATE_REFRESH_INTERVAL);
                        } catch (InterruptedException e) {
                            // set the interrupted status back (it was cleared in sleep())
                            Thread.currentThread().interrupt();
                        }
                    }
                }
            });
            mWorkerThread.start();
        }

        public void stop() {
            if (mWorkerThread != null) {
                mWorkerThread.interrupt();
                mWorkerThread = null;
            }
        }

        @WorkerThread
        private void updateDate() {

            /*
             We don't want the date displayed being stuck if we ever need to wait for connection,
             therefore we show informative notification.
             The notification should be cancelled if the service is connected
             */
            if (!mConnectionFailure) {
                mMainHandler.postDelayed(mConnectionInProgressNotification, 100);
            }

            // this call can block the worker thread for up to CONNECTION_TIMEOUT milliseconds
            if (mIpcServiceConnector.waitForState(IpcServiceConnector.STATE_BOUND_CONNECTED,
                    CONNECTION_TIMEOUT)) { // IPC service connected

                mConnectionFailure = false;
                mMainHandler.removeCallbacks(mConnectionInProgressNotification);

                try {
                    mCurrentDate = mDateProvider.getDate();
                } catch (RemoteException e) {
                    // this exception can still be thrown (e.g. service crashed, but the system hasn't
                    // notified us yet)
                    mCurrentDate = "-";
                    e.printStackTrace();
                } catch (NullPointerException e) {
                    /*
                     Since mDateProvider is assigned/cleared on UI thread, but is being used on
                     worker thread, there is a chance of race condition that will result in NPE.
                     We could either add synchronization, or catch NPE - I chose the latter in
                     order to simplify the (already complex) example
                     */
                    mCurrentDate = "-";
                    e.printStackTrace();
                }
            } else { // could not connect to IPC service
                Log.e(TAG, "connection attempt timed out - attempting to rebind to the service");
                notifyUserConnectionAttemptFailed();

                /*
                 Connection error handling here. I just attempt to rebind to the service, but a real
                 error handling could also employ some extrapolation of cached data, etc.
                 If this is a fatal error from your application's point ov view, then unbind from
                 the service and stop the worker thread.
                  */

                mConnectionFailure = true;

                mIpcServiceConnector.unbindIpcService();
                if (!bindDateProviderService()) {
                    Log.e(TAG, "rebind attempt failed - stopping DateMonitor completely");
                    DateMonitor.this.stop();
                }

                return;
            }

            mMainHandler.post(mDateNotification);
        }

        private void notifyUserConnectionAttemptFailed() {
            MainActivity.this.runOnUiThread(new Runnable() {
                @Override
                public void run() {
                    Toast.makeText(
                            MainActivity.this,
                            "connection attempt timed out - rebinding",
                            Toast.LENGTH_LONG)
                            .show();
                }
            });
        }
    }
}

the main part of the code is an inner class DateMonitor – this class spawns a new thread, makes sure that IPC Service is connected, queries the Service for a data and handles all the errors (except for a single fatal error which would occur if initial binding wouldn’t succeed).

I encourage you to download the source of a tutorial application, install it and play around. You will discover a very interesting system’s policy concerning bound, but crashing IPC Services. You can also use DDMS in order to kill the child process in which the Service runs, thus simulating the Service being killed by OS. You might be surprised to find out that the system treats killed and crashed Services very differently.

Happy shell users could kill the child process with a command similar to this one:

    adb shell ps | grep childProcess | perl -ane 'END {print @F[1]}' | xargs adb shell kill

and even automate this process in order to see what happens when the Service is being killed perpetually (and if you do that – please let me know about your results in comments).

Conclusion:

In this post we saw that, though far from trivial, it is possible to implement clients that establish reliable, crash and kill tolerant connections to IPC Services. This becomes possible once we “distill” the states of connections to bound IPC services, and understand how various transitions between these states should be handled. We also discovered that the system treats crashed and killed bound IPC Services differently.

Please leave your comments and questions below, and consider subscribing to our newsletter if you liked the post.

This article has 5 comments

  1. Fernando Reply

    Thank you for your insight, I am developing an app with a Service in it’s own process for the first time and wondered how to handle service crashes. Will do my own tests based on your work.

    • Vasiliy Reply

      Hello Fernando.
      Thank you for your comment – it is a first sign that this post was actually helpful to anybody 🙂

      I didn’t test this code on post-M devices, therefore it would be great if you share your perspective when you’re done.

      Also, please don’t hesitate to contact me directly on any issue you might encounter (techyourchance@gmail.com).

  2. Alexey Romanov Reply

    You say in the beginning “Official documentation on bound services … contains plainly wrong statements (as will be discussed later in this post).” However, I couldn’t find what this refers to even after rereading the post.

    • Vasiliy Reply

      You’re right – this statement is not being clarified in the rest of the article. Furthermore, I read the documentation again, and it looks like it was updated to a more consistent state.

      If I remember correctly, there were two aspects that I referred to as “plainly wrong statements”:

      1) The documentation defined a “bound” state incorrectly. You can still see the artifact of this incorrect definition in the example code at the end: variable mBound is being set to true in onServiceConnected callback, and to false in onServiceDisconnected. The actual “bound” state starts when bindService call returns true, and ends when unbindService is called. The service can remain in “bound” state while the connection undergoes multiple connected/disconnected transitions (this is demonstrated in the tutorial app).

      2) The documentation stated that onServiceConnected method will be called in response to a successful bindService call. This is not exactly correct – it will be called each time a connection to the bound service is established. If the service behaves well, then it will indeed be called just once after bindService, but if the service crashes or being killed by OS, it might be called multiple times (when the service is restored and reconnected).

      In any case, your question caused me to read this post again, and I found both the explanations and the code to be unclear. I guess I need to find a time to improve them.

      Thanks for bringing this up.

Leave a Comment

Your email address will not be published. Required fields are marked *