Surviving Sensor Network Software Faults
Yang Chen, Omprakash Gnawali, Maria Kazandjieva, Philip Levis, and John Regehr
Published in Proceedings of the 22nd ACM Symposium on Operating System Principles (SOSP), November 2009.
We describe Neutron, a version of the TinyOS operating system that efficiently recovers from memory safety bugs. Where existing schemes reboot an entire node on an error, Neutron's compiler and runtime extensions divide programs into recovery units and reboot only the faulting unit. The TinyOS kernel itself is a recovery unit: a kernel safety violation appears to applications as the processor being unavailable for 10-20 milliseconds. Neutron further minimizes safety violation cost by supporting precious state that persists across reboots. Application data, time synchronization state, and routing tables can all be declared as precious. Neutron's reboot sequence conservatively checks that precious state is not the source of a fault before preserving it. Together, recovery units and precious state allow Neutron to reduce a safety violation's cost to time synchronization by 94% and to a routing protocol by 99.5%. Neutron also protects applications from losing data. Neutron provides this recovery on the very limited resources of a tiny, low-power microcontroller.
Talk (2MB), Paper (1MB)
BibTeX entry
@inproceedings{sosp09chen, author = "Yang Chen and Omprakash Gnawali and Maria Kazandjieva and Philip Levis and John Regehr", title = "{Surviving Sensor Network Software Faults}", booktitle = "{Proceedings of the 22nd ACM Symposium on Operating System Principles (SOSP)}", year = {2009}, month = {November} }