Dissecting Siri

Siri is the voice recognition application used by Appel’s iPhones and iPads. The application captures the user’s voice and send it to Apple servers on the cloud that run voice recognition algorithms and return the voice samples converted in text format. Since this is a closed source application, there are very few details about its operation. Still, it is the largest user of Multipath TCP and for this reason, it’s worth being discussed here.

A recent paper [Cavivlione] written by Luca Caviglione briefly analyses the Siii application from a networking viewpoint. The papers looks at the sizes of the packets that were exchanged, tries to infer the type of data exchanged and the duration of the TCP connections. It discusses several scenarios during the user dictates various sentences. Unfortunately, the paper was written before the release of iOS7 that started to use Multipath TCP for Siri.

It could be interesting to perform similar tests with a recent version of Siri that uses Multipath TCP. Unfortunately, since the data is encrypted and potentially partially transmitted over cellular networks, this is more challenging than when a single TCP connection was used. Up to iOS6, the open-source SiriProxy could be used to intercept Siri messages and even use them to trigger some specific operations for e.g. home automation. Unfortunately, SiriProxy does not seem to be useable anymore with iOS7 as discussed in details on https://github.com/plamoni/SiriProxy/issues/542

References

[Cavivlione]Luca Caviglione, A first look at traffic patterns of Siri, Transactions on Emerging Telecommunications Technologies, 2013, http://dx.doi.org/10.1002/ett.2697