Managing server-side state

Server-side state is any information that exists on the server, and persists between requests. This can be anything from a single flag all the way up to a large database result set. In a typical application, server-side state is the identity of the user (once the user logs in) and, perhaps, a few important domain objects (or, at the very least, primary keys for those objects).

In an ordinary servlet application, managing server-side state is entirely the application's responsibility. The Servlet API provides just the HttpSession, which acts like a Map, relating keys to arbitrary objects. It is the application's responsibility to obtain values from the session, and to update values into the session when they change.

Tapestry takes a different tack; it defines server side state in terms of either entire objects ( application state object s) or by allowing specific page or component properties to be persistent.

Understanding servlet state

Managing server-side state is one of the most complicated and error-prone aspects of web application design, and one of the areas where Tapestry provides the most benefit. Generally speaking, Tapestry applications which are functional within a single server will be functional within a cluster with no additional effort. This doesn't mean planning for clustering, and testing of clustering, is not necessary; it just means that, when using Tapestry, it is possible to narrow the design and testing focus. The Tapestry framework embraces the correct best-practices for managing client state on either a single server or a cluster of services.

The point of server-side state is to ensure that information about the user acquired during the session is available later in the same session. The canonical example is an application that requires some form of login to access some or all of its content; the identify of the user must be collected at some point (in a login page) and be generally available to other pages.

The other aspect of server-side state concerns failover. Failover is an aspect of highly-available computing where the processing of the application is spread across many servers. A group of servers used in this way is referred to as a cluster . Generally speaking (and this may vary significantly between vendor's implementations) requests from a particular client will be routed to the same server within the cluster.

In the event that the particular server in question fails (crashes unexpectedly, or otherwise brought out of service), future requests from the client will be routed to a different, surviving server within the cluster. This failover event should occur in such a way that the client is unaware that anything exceptional has occured with the web application; and this means that any server-side state gathered by the original server must be available to the backup server.

The main mechanism for handling this using the Java Servlet API is the HttpSession. The session can store attributes , much like a Map. Attributes are object values referenced with a string key. In the event of a failover, all such attributes are expected to be available on the new, backup server, to which the client's requests are routed.

Different application servers implement HttpSession replication and failover in different ways; the servlet API specification is delibrately non-specific on how this implementation should take place. Tapestry follows the conventions of the most limited interpretation of the servlet specification; it assumes that attribute replication only occurs when the HttpSession setAttribute() method is invoked.

Note:

This is the replication strategy employed by BEA's WebLogic server.

Attribute replication was envisioned as a way to replicate simple, immutable objects such as String or Integer. Attempting to store mutable objects, such as List, Map or some user-defined class, can be problematic. For example, modifying an attribute value after it has been stored into the HttpSession may cause a failover error. Effectively, the backup server sees a snapshot of the object at the time that setAttribute() was invoked; any later change to the object's internal state is not replicated to the other servers in the cluster! This can result in strange and unpredictable behavior following a failover.

Tapestry attempts to sort out the issues involving server-side state in such a way that they are invisible to the developer. Most applications will not need to explicitly access the HttpSession at all, but may still have significant amounts of server-side state. The following sections go into more detail about how Tapestry approaches these issues.

Persistent page properties

Servlets, and by extension, JavaServer Pages, are inherently stateless. That is, they will be used simultaneously by many threads and clients. Because of this, they must not store (in instance variables) any properties or values that are specific to any single client.

This creates a frustration for developers, because ordinary programming techniques must be avoided. Instead, client-specific state and data must be stored in the HttpSession or as HttpServletRequest attributes. This is an awkward and limiting way to handle both transient state (state that is only needed during the actual processing of the request) and persistent state (state that should be available during the processing of this and subsequent requests).

Tapestry bypasses most of these issues by not sharing objects between threads and clients. Tapestry uses an object pool to store constructed page instances. As a page is needed, it is removed from the page pool. If there are no available pages in the pool, a fresh page instance is constructed.

For the duration of a request, a page and all components within the page are reserved to the single request. There is no chance of conflicts because only the single thread processing the request will have access to the page. At the end of the request cycle, the page is reset back to a pristine state and returned to the shared pool, ready for reuse by the same client, or by a different client.

In fact, even in a high-volume Tapestry application, there will rarely be more than a few instances of any particular page in the page pool.

For this scheme to work it is important that, at the end of the request cycle, the page be returned to its pristine state . The prisitine state is equivalent to a freshly created instance of the page. In other words, any properties of the page that changed during the processing of the request must be returned to their initial values.

The page is then returned to the page pool, where it will wait to be used in a future request. That request may be for the same end user, or for another user entirely.

Warning:

Imagine a page containing a form in which a user enters their address and credit card information. When the form is submitted, properties of the page will be updated with the values supplied by the user. Those values must be cleared out before the page is stored into the page pool ... if not, then the next user who accesses the page will see the previous user's address and credit card information as default values for the form fields!

Tapestry separates the persistent state of a page from any instance of the page. This is very important, because from one request cycle to another, a different instance of the page may be used ... even when clustering is not used. Tapestry has many copies of any page in a pool, and pulls an arbitrary instance out of the pool for each request.

In Tapestry, a page may have many properties and may have many components, each with many properties, but only a tiny number of all those properties needs to persist between request cycles. On a later request, the same or different page instance may be used. With a little assistance from the developer, the Tapestry framework can create the illusion that the same page instance is being used in a later request, even though the request may use a different page instance (from the page pool) ... or (in a clustering environment) may be handled by a completely different server.

Tapestry is flexible about how these properties are ultimately stored. Tapestry 3.0 and earlier were less so ... persistent page properties were always stored in the HttpSession. Starting in release 4.0, persistent page properties may be stored in the session, or on the client.

Using Persistent Page Properties

Persistent properties make use of a <property> element in the page or component specification. Tapestry does something special when a component contains any such elements; it dynamically fabricates a subclass that provides the desired fields, methods and whatever extra initialization or cleanup is required.

You may also, optionally, make your page or component class abstract, and define abstract accessor methods that will be filled in by Tapestry in the fabricated subclass. This allows you to read and update properties inside your Java code, such as inside listener methods.

Note:

You only need to define abstract accessor methods if you are going to invoke those accesor methods in your code, such as in a listener method . Tapestry will create an enhanced subclass that contains the new field, a getter method and a setter method, plus any necessary initialization methods. If you are only going to access the property using OGNL expressions, then there's no need to define either accessor method.

Note:

Properties defined this way may be either transient or persistent. For transient properties (properties which do not persist between requests), it is only necessary to specify the property (with a <property> element) if it has an initial value. Tapestry scans the component class looking for abstract properties that don't match up against component parameters or <property> elements; for each of these unclaimed properties, a concrete property is created. The property is a transient property, exactly as if a <property> element did exist for it.

A page class that uses a specified property:

package mypackage;

import org.apache.tapestry.html.BasePage;
	
public abstract class MyPage extends BasePage
{
    abstract public int getItemsPerPage();
	
    abstract public void setItemsPerPage(int itemsPerPage);
}

This is combined with a <property> element in the page's specification:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE page-specification PUBLIC 
  "-//Apache Software Foundation//Tapestry Specification 4.0//EN" 
  "http://tapestry.apache.org/dtd/Tapestry_4_0.dtd">
	
<page-specification class="mypackage.MyPage">

  <property
    name="itemsPerPage"
    persist="session"
    initial-value="10"/>

</page-specification>

Again, making the class abstract, and defining abstract accessors is optional . It is only useful when a method within the class will need to read or update the property. It is also valid to just implement one of the two accessors. The enhanced subclass will always include both a getter and a setter.

Note:

In Tapestry 3.0, many users were frustrated that they had to specify the type of property in both the Java code and in the page specification. This is a violation of the Dont Repeat Yourself principal -- requiring coordination is just an invitation for the two sides to get out of synchronization. Starting with Tapestry 4.0, there is no type attribute on the <property> element; instead Tapestry matches the type to the property type of any existing accessor methods, and simply uses Object when there are no accessor methods. In this example, the persistent itemsPerPage property will be type int, because of the abstract accessor methods.

This exact same technique can be used with components as well as pages, the component specification also supports the <property> element.

When an initial-value is provided, it is evaluated once inside finishLoad() , but then again every time the page is detached (returned to pristine state) before being stored into the page pool for later re-use. By default, it is an OGNL expression, but this can be overriden by providing a binding reference prefix.

This means that you may perform initialization for the property inside finishLoad() (instead of providing an initial-value). However, don't attempt to update the property from initialize() ... the order of operations when the page detaches is not defined and is subject to change.

Warning:

If the values stored as a persistent property is mutable (for example, a List, or a Map, or some custom Java class), you should not modify its internal state once it has been stored into the persistent page property. Changes to internal state of a persistent property, after it has been stored, may not be propogated to other servers in the cluster (this is really a function of how session replication is implemented by the application server, it has very little to do with Tapestry). Similar problems concerning mis-matched state can occur with client persistence.

Persistence Types

Tapestry defines two basic types of property persistence. The type of persistence (internally known as the property persistence strategy ) is defined by the value of the persist attribute (in the <property> element). Omitting the persist attribute, or not providing a <property> element, indicates a transient page property, one which does not persist from request to request.

client
Client properties are stored on the client, in the form of query parameters. All persistent properties for each page are encoded into a single query parameter, named state: PageName . The query parameter value is a MIME encoded byte stream. This can get quite long if there are many client persistent properties on the page ... which may quickly run into limitations on the maximum size of a URL (approximately 4000 characters is a good guideline). This is less a problem for forms.

In Tapestry 4.0 the concept of a scope was introduced, specifying how long a property will persist. For client there exist two scopes: page and app. If you omit the scope it will default to client:page.

client:page

Persisted properties will be available until another page is activated and rendered.

client:app

Persisted properties will always be available.

session
The traditional style of property persistence (and the only kind available in Tapestry 3.0 and earlier). Each persistent property is mapped to a HttpSession attribute.

More such stategies are expected; these will give more control over the lifecycle of the page property.

Application state objects

What happens when you have objects that are needed by multiple pages? For that, you need an application state object . ASO's are global objects that can be accessed from any page or component in the application. Each ASO has a unique name (the two default ASO's are called "visit" and "global" for historical reasons). An ASO is created when first referenced by any page. ASO's with session scope are stored into the HttpSession at the end of the request (and may force the creation of the session ). ASOs in the application scope are available to any and all users.

Note:

Tapestry 3.0 had a more limited form of ASO: The Visit object and the Global object. The Visit object is session scope, the Global object is application scope. This concept has been extended in Tapestry 4.0 to allow any number of ASOs with any desired scope, and lots of flexibility on how ASOs get created. The Visit and Global still exist in 4.0, as default ASOs you can use or override.

Defining new Application State Objects

To create a new ASO, you must update your HiveMind module deployment descriptor and add a contribution to the tapestry.state.ApplicationObjects configuration point:

<contribution configuration-id="tapestry.state.ApplicationObjects">
  <state-object name="registration-data" scope="session">
    <create-instance class="org.example.registration.RegistrationData"/>
  </state-object>  
</contribution>

This defines an ASO named registration-data that is session scoped (stored in the HttpSession once created). The first time it is referenced, an instance of RegistrationData is created and stored into the session.

If your data object can't be created using a simple constructor, then you can supply an <invoke-factory> element instead of <create-instance>. <invoke-factory> allows you to reference an object or service that implements the StateObjectFactory interface.

Note:

The two default ASOs, "visit" and "global", are defined in the hivemind.state.FactoryObjects configuration point. Definitions in the ApplicationObjects configuration point override definitions in FactoryObjects with the same name.

Accessing Application State Objects

Tapestry provides an <inject> element to support access to the application state objects. This element can be used in any page or component specification to create a new property. Reading the property will obtain the corresponding state object (which will be created if necessary). The property may be updated, which will store a new application state object, overwriting the automatically created one.

For example:

<inject property="registration" type="state" object="registration-data"/>

This will create a registration property, which can be wired to components. Your class may define accessors for this property, in which case you should be sure that the application state object is assignable.

Warning:

The IPage interface defines two read-only properties: visit and global . These are both type Object. This is a holdover from Tapestry 3.0, which only supported these two application state objects. If you want to access the visit or the global application state objects without needing casts, you will have to inject as a differently named property (say appVisit or visitObject ).

Optimizing Storage

Normally, Tapestry has no way of knowing when the internal state of an ASO has changed. On any request where the ASO is accessed, Tapestry assumes that its internal state has changed. The ASO is re-stored at the end of the request. For session-scoped ASOs in a cluster, this is critical to ensure that information is properly distributed around the cluster.

However, it can also be expensive. Assuming that your application mostly reads information out of the ASO, that means a lot of wasted resources needlessly copying the ASO around the cluster.

To control this, ASOs may optionally implement the SessionStoreOptimized interface. The method isStoreToSessionNeeded() will be checked; if it returns false, the object will not be stored.

Typically, the ASO will store a dirty flag, and will set the dirty flag on any change to internal state. The flag will be returned by isStoreToSessionNeeded(). The ASO will also implement the HttpSessionBindingListener interface, and clear the flag in valueBound().

A base class, BaseSessionStoreOptimized , implements this behavior.

Stateless applications

In a Tapestry application, the framework acts as a buffer between the application code and the Servlet API ... in particular, it manages how data is stored into the HttpSession. In fact, the framework controls when the session is first created.

This is important and powerful, because an application that runs, even just initially, without a session consumes far less resources than a stateful application. This is even more important in a clustered environment with multiple servers; any data stored into the HttpSession will have to be replicated to other servers in the cluster, which can be expensive in terms of resources (CPU time, network bandwidth, and so forth). Using less resources means better throughput and more concurrent clients, always a good thing in a web application.

Tapestry defers creation of the HttpSession until one of two things happens: When a session-scoped application state object is first created, or when the first persistent page property is recorded. At this point, Tapestry will create the HttpSession to hold the object or property.

For the most part, your application will be unaware of when it is stateful or stateless; statefulness just happens on its own. Ideally, at least the first, or Home page, should be stateless (it should be organized in such a way that the Visit object is not created, and no persistent state is stored). This will help speed the initial display of the application, since no processing time will be used in creating the HttpSession.

The state: binding reference combined with the If component makes it easy for you to skip portions of a page if a particular ASO does not already exist; this allows you to avoid accidentally forcing its creation on first reference.

The application may be stateless even when it has persistent page properties, if those properties use the client persistence strategy (which encodes pesistent page data into URLs as query parameters). This can be a very powerful approach, though it introduces its own problems:

  • The query parameters are an encoding of Java objects, and could be decoded to expose privileged information.
  • The encoding of page state can result in very long strings included as part of URLs, possibly extending beyond the 3000 to 4000 character effective maximum URL length.