Meru Virtual World Architecture

People:

Stanford: Daniel Horn, Ewen Cheslack-Postava, Tahir Azim, Behram Mistree, Philip Levis
Princeton: Mike Freedman

Introduction:

We believe that one of the next major application platforms will be 3-dimensional, online virtual worlds. Virtual worlds are shared, interactive spaces. Objects inhabit the space, have programmable behaviors, and can discover and communicate with other objects. Users commonly experience the world through an avatar and can interact with objects in the world, much as they would in the real world. Many currently deployed applications are virtual worlds: multiplayer online games such as World of Warcraft and social environments such as Second Life are popular examples. Other examples include environments for virtual collaboration, distance learning applications, and augmented reality.

Unfortunately the early evolution of virtual worlds has been ad hoc. They have completely independent constructions, share few architectural aspects, and offer little or no interoperability. Because these systems are designed for very specific applications, each suffers from at least one of poor scalability, centralized control, and a lack of extensibility.

The Meru Project is designing and implementing an architecture for the virtual worlds of the future. By learning how to build applications and services before they are subject to the short-term necessities of commercial development, we hope to avoid many of the complexities other application platforms, such as the Web, have encountered.

Virtual World Components:

Meru addresses the issues of scalability and federation by carefully separating the components of a virtual world. The core components of any virtual world system are the simulation of the world, the simulation of individual object behaviors, and the storage and distribution of the content of the world. The Meru architecture separates these concerns:
  • Object Host: handles simulation of individual objects: receiving messages, handling events, and simulating behavior using user-created scripts.
  • Space: handles inter-object behavior: it gives objects names in the world, helps objects discover the names of other objects they might be interested in, enables communication between objects, and might provide physical simulation such as rigid body physics.
  • Persistence Services: handle storing and serving the large, read-mostly data a virtual world needs, such as textures and meshes.

Space Architecture:

Currently we are focusing on the architecture of space servers. They provide four basic services:
  • Naming - the space is a communication medium for objects, so it must give them names they can use to refer to each other. In this sense, the space can be though of as an address space. We use a simple flat namespace and assign these identifiers randomly.
  • Discovery - in order to send messages, objects must have other objects' identifiers. The space must provide a way for objects to discover other object identifiers. Of course this mechanism should aim to return the identifiers of the most relevant objects to the querier.
  • Communication - the space acts as a communication medium and mediates all inter-object messages. It must provide routing of messages and possibly apply rules to restrict communication.
  • Physical Simulation - much physical simulation depends on the state of multiple objects, and is simplified by having a single authoritative simulator. The space may provide some form of physical simulation, ranging from simple collision detection and response to a complex rigid body simulation, depending on what the particular world calls for.
We are designing our space service to handle all of these scalably. Some specific challenges we face in designing and implementing the components that provide these services are:
  • Efficient and Scalable Discovery - most existing systems use a simple distance cut-off approach to discovery, where all objects within some radius will be returned. While it is known how to efficiently implement this approach when the radius is relatively small, the nature of the query requires the radius to be large to find the majority of objects that are important. Further, these systems return many objects that could be quickly discarded as irrelevant. We're looking into different types of queries that more efficiently find important objects, but can still be implemented efficiently and scalably.
  • Scalable Communication - without restrictions on communication, object messaging can very quickly overload the system, leading to poor service for everyone. We are investigating how we can control message queuing to provide physically plausible quality of service under load while making the best use of resources when not under load.
  • Load Balancing - interests naturally collect -- we know that interests often follow a Zipf distribution. This implies that as the world grows, the maximum load on a single space server using fixed size regions will increase. The resulting problems are already evident in most other systems that split their worlds into fixed sized regions: a few central hubs are overloaded or the number of participants is simply capped to avoid the problem. Instead, we are investigating ways to balance the load by dynamically segmenting the world, allowing loaded servers to split in order to double available bandwidth and compute power, and underloaded servers to merge so unused regions do not waste resources.

CURIS Project Ideas:

We are building an open source 3D virtual world system, Meru. The goal of Meru is to support planetary-scale virtual worlds, where millions of users can seamlessly interact in a shared, continuous environment.

To achieve these goals, Meru has a number of advanced innovations that make it very different from exiting game and social virtual world architectures. Meru separates communication and computation, has a unique mechanism for spacial partitioning, introduces a new networking substrate that has special facilities in place for object introduction, and integrates geometric properties (such as object size) into many of its optimizations.

Students working on Meru have the opportunity to research novel software systems and networking problems in collaboration with a team of graduate students.

Because Meru is such a broadly scoped project, there are many research problems to tackle. Some examples include:

  • A scalable sound mixer that takes input streams from many objects in ethe world, forms an aggregation hierarchy, and generates output streams to listening objects. Imagine being able to simulate the sounds of a virtual city: traffic is a continuous background of sounds, but notable sounds like sirens can still be heard. One interesting property of this problem is that sound is slow: this may allow, or even require, the system to introduce latency.
  • An update system so servers running one part of the world can efficiently update other servers of important state changes. Such a system would enable Meru to merge with one of its offshoots, Sirikata (http://www.sirikata.com) and take advantage of Sirikata's more advanced client graphics while still keeping Meru's ability to scale to planetary-scale worlds.
  • Incorporate behavior simulation into virtual world clients. Traditional multiplayer games perform a lot of simulation on the client side to make the activity in the world appear smooth to the end user even though updates may be infrequent and have high latency. Most virtual worlds today do little or no client side simulation. Because behavior in virtual worlds is much more flexible than in games and new behaviors are added every day, well known techniques may not work well or may need to be generalized.
  • When building Meru, we have consistently run into an impedance mismatch between the transport semantics that system developers and scripters need and the available transport abstractions. Most games and virtual world systems use an ad hoc combination of UDP and TCP, or an ad hoc reliability layer built on top of UDP. In this project, the student will explore new transport abstractions which would be more effective for the workloads and requirements of virtual worlds. For example, virtual world updates are often ephemeral and so not every one needs to be reliable, but the last update is critical as we don't want to get stuck with an out of date value for a long time.
  • In order to be fault tolerant, object scripts in the system should be able to push their state to persistent storage. When a failure occurs, the system should be able to recover an up to date version of the object and reconnect it to the world. Some of the issues that this persistence layer must address are durability, atomicity, and consistency.

Publications:

[1] Daniel Horn, Ewen Cheslack-Postava, Tahir Azim, Michael J. Freedman, Philip Levis, "Scaling Virtual Worlds with a Physical Metaphor," IEEE Pervasive Computing, vol. 8, no. 3, pp. 50-54, July-Sept. 2009, doi:10.1109/MPRV.2009.54