|
|
![]() |
|
Conventional Web sites consist of fairly static information. People are only given material that is targeted toward them when they tick a box in a registration form saying "I'm interested in golf" or "Yes, please sign me up for the e-commerce newsletter". User profiling is all about doing away with the idea of asking the user what he or she wants, and figuring it out automatically from what they do and where they go on the site, then automatically generating content based on this information. The concept of automation is simple: by installing an automated system to follow users around the site, you can ensure that what they see - both content and advertising - is as relevant as possible to each specific user. If, for example, the system knows that a particular user tends to hit the sports news section first, it will serve them sports-related advertising. Someone who often goes to the motoring pages would get ads for BMWs, and someone with no particular pattern, whose surfing is fairly random, would get generic ads. So How Is This Different?The obvious question is: how does this 'profiled' approach differ from what we have now? After all, if a user goes to the sports section of a site, they are likely to see content and advertising relating to sporting and leisure goods. The point is that because the system knows the profile of the user, and that the user has (in this case) a significant perceived interest in sport, it can serve sport and leisure advertising to that user wherever they are on the site, not just in the sports section. Obviously users will generally have a number of perceived interests, which the system will weight against each other in order to produce a sensible variety of advertising. But the point is this: there is no longer a need to restrict subject-specific advertising to one particular section of the site. Patterns Of UseThe desire is obviously to discover each user's pattern of usage. There are a number of factors to this pattern, but the main criteria are the user's identity (who), the pages they have viewed (what) and the dates and times of those accesses (when). By identity, incidentally, we do not actually have to know who the user is - their name, address, email address, and so on. All the system needs to do is discriminate between the different people using the site. So if someone comes to the site for the first time, they will be allocated a unique identifying number, generally using a cookie. The next time that user visits the site, the profiling system will see that identifying number, look up the record in its profile database corresponding to the number, and pull out the necessary information about that user's surfing habits. The only drawback, of course, is that some users turn off the cookie support on their browsers, labouring under the misconception that cookies are in some way insecure. There are ways around this, though. For example, the site can be implemented behind the scenes as a set of HTML forms, with the user identifier passed around as a hidden data field. The what aspect of the site is of paramount importance - the whole point of profiling is that the system is detecting what the user is choosing to read and acting upon that information. The main thing to consider with the what aspect is that the system must be able to work out what type of information is being viewed - to classify the content. If a site is split into sections, this classification could be as simple as telling the profiler: "anything under /leisure is classed as sport and leisure". The most accurate approach, though, is for the profiler to analyse the traffic as it flows to the user and make intelligent decisions on what it finds. This approach avoids the need to constrain a site's structure by forcing it into contrived 'sections' - a useful concept given that a Web site is an arbitrary network of pages, not just another hierarchical directory of data a la FTP or Gopher. The obvious example here is the growing number of "community" Web sites - the range of interests covered by the members is sizeable, and it is artificial to attempt to say to people "Please classify your home page in one of these ten categories". Finally in this section we have when. There are two aspects to this parameter - when something happened and for how long it continued. Let us look at the latter first. If someone clicks a link and then immediately visits something else, this could well mean the first click was a mistake. If, on the other hand, a link is clicked and then it is several seconds before the next click, the chances are that something on that page was of interest to the user. Of course, if nothing happens for some minutes, the user has gone away, so timeouts are important. The time and date is probably the most important factor. Imagine a user makes fairly intense use of the motoring pages for two or three weeks, reading articles, searching databases of cars, entering words like "Renault" and "Rover" in the search boxes, and posting messages to forums. Now imagine that usage stops almost instantly, and that for the past week or two there has been only a small amount of motoring-related activity from that user. The chances are that the user was looking to buy a new car - and that during the period of intense activity in the motoring section, he or she was an ideal candidate for advertising promoting Renaults, Rovers, Daewoos and such like. The drop-off in usage probably means they have now bought their car, and the small amount of usage comes from postings on the messageboard system saying "Thanks for your advice - I ended up buying XXX". This does not necessarily mean that user should see no more motoring ads - in fact during the period immediately following the purchase, they may well be in the market for new brands of wax polish, navigation computers, alloy wheels and furry dice. The point is: all data must be aged appropriately to accommodate people's changing preferences and surfing habits, both sudden and gradual. Fast ResponseThe main requirement for a profiling system is that it should react quickly to a user's needs. Some people believe that a daily 'batch run' incorporating the previous day's tracking data into the user profile databases is acceptable. Frankly, if this was the case then people would simply be writing simple scripts to analyse the referrer log files (a standard component of any Web server) for the past day/week/month and incorporate the result in the ad serving engine. The most effective way to work is to incorporate surfing habits as quickly as possible into what the user sees. As will be seen later, there is one startling way to approach this, but the basic need is to detect user movement and update the profile database on the fly, so that new data is immediately available to the system that decides what to serve to whom. The best possible result is to be able to present even the newest of users with advertising that is as well targeted as possible given the potentially limited amount of information available about that user. Though while you can design algorithms that make the best possible use of data as quickly as possible, the laws of statistics dictate that if you have next to no data to go on, even the most effective algorithm cannot conjure up a profile from nothing. Thinking LaterallyWe mentioned earlier that there is no real need to know who a user is when building their profile. All we need to do is identify each user uniquely with a code number and we can begin to build the profile. Obviously we can add personal information should it become available If, for example, our site has an online shop and someone makes a purchase, we need their name and address to deliver the goods. We can simply make a link between the database of people we know and the profiler's data repository. For profiling purposes, though, this is not essential. So when a user first visits our site, where could we possibly obtain information about that user? If they are a first time visitor, our own systems know nothing about them, but it would be most useful to be able to serve a relevant advertisement to each person the moment they land on our site. Thinking laterally we ask: is there another site they have been to which knows about them? The answer is probably "yes". This is the approach taken by a service from Engage (www.engage.com) called Engage Knowledge. The theory is that data concerning the surfing habits of a particular user, if stored in a central repository rather than on a Web site's own servers, can be utilised by other sites in the group. All that is required is a two-way flow of information: from the site to the central repository (anonymous, of course, to keep the data protection authorities happy) and from the repository to the site. So any profiling information gleaned from a Web site is uploaded to the service provider's database, and when a user comes into the site, the service provider's database is interrogated for information about that user. The more sites that connect to the repository, the more information there is available about more users. Using this kind of system, the instant a new user lands on a site, we know something about them because they have probably been to several other sites using the same service, and we can download their profile information. Not Just AdvertisingOf course, you can do more than just intelligently serving advertising. The same profiling theories can be applied just as effectively to the content on the pages the user sees. For a content-based site this means users can have pages automatically tailored to fit their perceived interests, and for e-commerce sites there is the ability to flash different special offers to different users, depending on which products they are most likely to buy. Content automation is huge business. The best-known name in content automation is Autonomy (www.autonomy.com), whose Portal-in-a-Box package can bolt into existing Web sites to add content automation. Other offerings are available as services, rather than in software form - such as h3 from Hyperlink (www.hyperlink.com) and the Networked Personalization Service from TriVida (www.trivida.com). The obvious connection between content-based and ad-based profiling is now starting to be made. Autonomy is working on an interface into Engage's AdServer engine. Even if you are using an ad engine that requires you to tag pages in order to target ads, you could do some constructive page-building using a profiling system to dynamically include the tags, which the ad service then uses to display relevant advertising. Sites with customised content are 'stickier' - it is easier for users to find material that interests them, as it can be presented as links on pages they are seeing instead of them having to search for it themselves. This not only increases the page impressions of the content side of the site, but as a consequence also produces more ad impressions. Better Than BeforeWithout user profiling, basic statistical information can be gleaned (in hindsight only) from traditional access and referer logs. But the results can be skewed by all manner of factors - multiple users accessing the site from the same address, as happens for users behind a proxy server or firewall. You can also use traditional systems such as Web forms with interminable lists of checkboxes for people to click to register their preferences. Yet these are tedious and produce a low response rate from users who simply can't be bothered to fill them in. The only way to make profiling work effectively is for it to interact with the user as pages are being viewed, using cookies or other script-based mechanisms to uniquely identify the user and the session. When implemented in this way, user profiling is a powerful tool, and the entire view a user has of the site can be customised on the fly to present them with the most suitable content. By the same author: DIY User Profiling Dave Cartwright's first proper job was running NetWare 2.0a servers for a defence contractor and fighting with a digital phone switch (one of the first of its kind). Having graduated with a boringly technical degree in theoretical computer science, he became a Unix systems and network manager at UEA, Norwich, UK. While there, the Internet came to UK academia and later Mr. Berners-Lee came up with this Web thing (an excellent excuse to 'research new technology' rather than doing boring support stuff). Before disappearing into journalism in '95 (as technical editor of Network Week) Dave did a lot of work back-ending Web servers with databases. Having earned an easy living for a couple of years as a techie writer, he then went back into the real world as IT and Telecomms Manager at CMP UK. He's now Chief Technology Officer at Vavo.com. |
| Suits | Ponytails | Propheads | Contact WDJ | Discuss | Web Audio | Search |