Sean Butler

smbutler@gemini.ibm.com

RIT: ICSA750 – Distributed Computing

March 1996

We live in a data age. Despite what we see, hear, and read in the media, we are not yet in the information age. More data exists now than in the history of the world, but we can not yet call it information, as it is too difficult to find what is needed in a timely manner.

Websters dictionary (1) defines information as:

information (n)                            
DEFINITIONS:                               
1   the communication or reception of      
    knowledge or intelligence              
2a  knowledge obtained from investigation, 
    study, or instruction

Websters defines data as:

datum (n)                                         
data                                              
DEFINITIONS:                                      
1   (pl) something given or admitted esp. as      
    a basis for reasoning or inference            
data                                              
2   (pl but sing or pl in constr) factual         
    material used as a basis esp. for             
    discussion or decision

From those definitions, we can see that information is processed data. Until the World Wide Web (WWW) was created, finding data on the Internet was quite tedious. Currently, there are WWW tools that can find the data, and present it to you, so it is one step closer to being information. However, processing the huge amount of data the WWW tools do present to you is still tedious. In the future, there will be WWW tools that will find the data, process it to transform it into information, and then present it to you. There will also be services that present information to individuals based on personal profiles of their interests, since it is impossible to keep up with and find such information by ‘surfing’ the Net.

The Internet is a distributed system of computers and networks, and it has data and resources everywhere, but it is not concise, timely, or easy to get to. There is no overall organization or classification of the data. The WWW is not and will not change that distribution of data, or try the impossible task of organizing it, but it does give the end user the appearance of some organization.

This paper examines the history of the WWW, and briefly covers the technology and schemes behind it. It investigates how the WWW will be our tool to pull from the data warehouse of the Internet to give us timely, precise, accurate information, or, in other words, to carry us from the Data Age into the Information Age. Concerns such as the costs involved for users, and the solutions to storage and bandwidth issues for the future directions of the WWW are not discussed.

A Brief History and Overview of the Web

A short history of the Internet is needed to understand how and why the Web was developed, and its possible future directions. The Internet was created as a research project of the US Defense Advanced Research Projects Agency (DARPA) (2,3). DARPA wanted to create a network that was not centralized, and that could continue to operate when an entire section of the network went down (i.e. was blown up by a nuclear war head). Messages sent from one computer to another would be broken down into packets, and the route the packet took could be different from one minute to the next.

In 1969, the ARPANET was created, and from it the Internet grew. In the early years, mostly government and research facilities (i.e. Universities) connected their own networks to it, and eventually, when the commercial opportunities were realized, a snowball affect began to occur to give the Internet the growth we have seen over the past few years.

As the Internet grew, a culture developed where the dissemination of information was a primary goal, and tools such as FTP, GOPHER, WAIS, and ARCHIE were developed to help find and deliver that information. Basically, the information could be sent and received as ASCII or BINARY files, and the application on the receiving station would be used to view or use the file. The tremendous amount of information, and the many types of applications needed to distribute and receive that information, was a deterrent to the distribution of it (2).

In 1990, Tim Berners-Lee and Robert Cailliau, two scientists working at CERN (the European Particle Physics Laboratory), began to develop a system to distribute their research information across CERN’s global network (2,3). They created an application with a single user interface to many types of data, including reports, databases, and technical documentation.

The next big step in the growth of the WWW was the February 1993 release of “Mosaic for X,” by Marc Andreesen, who at that time was a programmer at the National Center for Supercomputing Applications (NCSA) (2). Mosaic, a Web browser with a GUI point and click interface, would become the standard platform for Web access. In September of 1993, NCSA released browsers for Windows based PC’s and Macintosh computers. In March of 1994, Andreesen left NCSA to form NetScape, to market Mosaic and other WWW products full time (3).

The Web can be described as “a distributed heterogeneous collaborative multimedia information system” (2). Basically, this just means that the Web is a way of distributing all types of data across all types of computers in a standard format.

An interesting quote on this distribution of information and how it affects society and networking from Standford lecturer Robin Milner is paraphrased here:

Think very deeply. Very very deeply. When you think both about computers and the real world, when one talks about mobility of objects, it isn’t really the mobility of objects that matters, the objects might not even really exist, all that really matters is the movement of the access paths/addresses to the objects. These access paths are the only component of the system we actually need to reason about. When Robin Milner changed jobs what matters to other people isn’t where Robin “really is” what matters are the access paths to him. Some of these access paths remained the same (email), some changed (walking), and some new access paths were created (a new email address).

— Above pulled from (7) Gordon Irlam’s page

The development of the Web is analogous to Gutenberg’s invention of the printing press (2). It is also more far reaching, because the Web allows the combination of several media, including text, graphics, audio, and video, into one document. Multimedia programs of this type had already been developed for single computers, but with the Web, you can link from one document to many others, and the pieces of data that make up the document can be found on computers around the world.

The above information was composed from the following references, most of which are available on the WWW:

A Brief Overview of the Ttechnology

As described above, the WWW was developed so that any resource on it, whether that resource be a file, a picture, or a report, could be found and accessed with one simple interface. It was also designed so that each resource could be included in other resources, or could have pointers to other related resources. There is such an incredible amount of data on the Web, and more is emerging each day, that knowing how to locate what you want is quite difficult.

Even if you know the location of every resource you want to view, you must be able to tell others how to find them also. The developers of the Web came up with a logical addressing scheme. The Internet already had a scheme for addressing connected computers, so the Web was just another addition to that scheme. The computers on the network use the TCP/IP address scheme of xxx.xxx.xxx.xxx, where xxx = 1 to 254 (0 and 255 are reserved) (8). However, humans remember and work with names better, so we use them instead of numbers. The names we use are resolved to numbers that the network understands by special computers on the Internet called Domain Name Servers (DNSs).

Addressing for the Web is based on Uniform Resource Names, or URLs. A URL points to a specific resource on the WWW, and is broken down as follows (2):

scheme://host[:port]/path/filename

Here is what each part of the URL can be:

scheme What type of resource this is. Examples are HTTP for a document in hypertext transfer protocol, FTP for file transfer protocol, GOPHER for the text based menu driven protocol that was a precursor to the Web, and so on.
host[:port] The host is the actual name or number address of the computer on the Internet that stores the resource you are referencing. The Internet is made up of millions of computers distributed around the world, and they can all contain resources that others have been granted access to by the administrator of that host. The port is the actual TCP/IP port that should be used on the host. TCP/IP is the protocol used on the Web to interconnect different networks, and for each transaction that takes place, a port from 1 to 65534 for each host must be specified. Many of these ports are reserved, such as port 23 for TELNET, while many are free to use for many applications. Often, the port will not be specified, as it is left to default to the standard value for the given scheme. Click here to see a list of all TCP ports.
path This is the directory in the operating system on the specified host that contains the resource.
file The actual resource in the URL. The extension on the filename (filename.ext), is used by the Web browser to know how it should be presented. For instance, .htm or .html indicate an HTML file (see below), a .au file indicates an audio file, etc.

HyperText Transfer Protocol(HTTP) and HyperText Mark up Language(HTML)

The Web is designed on the client/server model, where HTTP is the server, and the Web browser is the client. The client and server have a conversation, where the format of the resources is negotiated (8). The most common format is Hypertext markup language.

HTML is perhaps the most important protocol used on the Web (7). The format of HTML is relatively simply, which is just one reason the Web has taken off like it has. HTML uses tags to format the documents. A tag begins with a “<” and ends with a “>,” and has the form ‘<tag>.’ The format starts with ‘<tag>’ and ends with ‘</tag>.’

Here are a few basic examples of text formatting:

<h2>Title</h2>
<b>bold face</b>
<p>entire paragraph</p>

There is much more to HTML, including advanced formatting tags and ways to include references to other documents, to graphic images that can be included in the document, etc. For further reading, see the following URLs:

The most important aspect of HTML and HTTP is that it allows any given document to reference, or point to, any other resource, anywhere on the Web. This feature is known as HyperText, and really promotes the distribution of data. These references to other resources appear highlighted in the document that you are viewing, and by selecting that highlight (usually by clicking on it with your mouse), your browser will load the new document from the HTTP server the reference points to.

Note that hypertext has really evolved into hypermedia(2). Instead of highlights in a document that reference another text document, you can now reference pictures, audio, video, etc. Click here to see a timeline of the development of hypermedia(9).

The above information was composed from the following references, available on the WWW:

The Future of the WWW

The Web provides a new paradigm for computer networking. The importance of the Web on society has as yet barely been noticed. Society is defined by patterns of human communication. Anything that changes those patterns will change society in a fundamental way.

— Taken from (7) Gordon Irlam’s page

The WWW has come along way in a very short amount of time, and it will continue to grow and evolve. There are many problems currently facing the Web and the Internet, such as limited addresses in the TCP/IP scheme, tremendous growth and limited bandwidth, capacity for storage of the vast amounts of data that are being put on the Internet, etc. All of these problems will be overcome, though, with new and better technology. A more interesting problem to discuss, however, is that of the data age, where we are overcome by tremendous amounts of data that we can not possibly process ourselves, vs. the information age, where the burden of the transformation of data into information is handled by the computer.

The Web represents a vast array of data that is distributed and linked together on computers around the world. However, there is so much data that it is difficult to find the information you need quickly. The Web is an unstructured resource, and there is no overall control to help structure it(10). There are no standard formats, no organization of subjects, and no comprehensive indexes. Basically, there is a wealth of data, but no way to take that data and turn it into information. The future of the WWW will greatly depend on the development of tools that can take the burden of transforming the data into information off of humans and placing it on computers.

Aspects of the Future

The WWW is basically used for three purposes: getting information, commerce, and entertainment(11). Information includes news services, reference materials, and journals. Commerce is just that: doing business over the Web, i.e. buying and selling products and services. Entertainment includes games, surfing just to see what is out there, and the audio and video transmissions that are becoming more popular.

A division of the Internet into three separate networks to support these different types of users might make sense to some people because all three categories involve different interfaces to the content, and various levels of response are necessary. This is unlikely to happen since the backbone network and infrastructure are already in place, and changing that would not be cost efficient. However, it is interesting to discuss the future implications of these three divisions. Since the commerce segment and entertainment segment are not as relevant to the WWW carrying us from the Data Age into the Information Age, most of the discussion will focus on the information segment.

Commerce

The business domain of the Internet is in its infancy, mainly due to two reasons. First, a large portion of the Internet does not include the average consumer. This is rapidly changing as more and more people get access to the Net. Once the majority of people have access, businesses will be ready to take off. The second problem is with the security and integrity of transactions. Everyone is afraid that their credit card numbers will be intercepted and used by others. Secure transaction methods, involving encryption or “cybercash,” will need to become standards before the masses are willing to do business over the Net. Commerce will affect the data vs. information issue in some ways, because although much of the information resources available today are free, this will not remain so in the future.

Entertainment

This segment of the Web is the most difficult to predict(11). The potential for intense, graphical interfaces to games with users around the country and world exist. Video on demand including TV shows or movies may be a part of the Web, instead of the normal cable television companies. The Internet, telephone companies, and cable companies may combine to form a single backbone that can carry all three types of transmissions. In any case, there will always be ‘surfers’ of the Web, but more advanced forms of entertainment will also be available. The entertainment segment of the Internet may affect the data vs. information issues in some ways, but not to any great extent. For instance, consumers may have one stop shopping for their telephone, cable TV, Internet access, and information services.

Information

The heart of the Internet was and still is information. The Internet developed where the dissemination of information was the goal. As entertainment and commerce pickup, the amount of information processing on the Internet will diminish. However, there still will be vasts amounts of data and resources available, and there will be two problems. First, there is currently a problem with the quality of resources. However, that will improve over time, as users of the Web will not visit sites that have poor quality, so those sites will eventually either be removed or improved by their authors. Second, is the quantity of information. There is too much data for humans to effectively process, so several developments will need to occur.

Searching for information

One such development that is already in the early stages is the many search engines available to assist in finding resources on the Web. There are many such search sites today, and several use different techniques to help users find what they are looking for (10). Some give lists of possible ‘hits’ on the given search criteria, and some give ‘scored’ lists, where the search engine rates each reference it has found. Some actually do searches on the contents of Web pages, while others are subject oriented tools, where the owner of the search site must add resources to different sections of their tools. Popular search tools of the scored and subject nature are:

The problem with these types of search engines is that the human still has to go in and read each site the tool finds to see if it is really relevant to the search criteria. The human is still the final processor of the data and in charge of turning it into information. However, with the vast amounts of resources on the Web, this process will soon be too much for a person to handle. In fact, in my research for this paper, I found thousands of documents for each of the following search strings:

“history WWW”
“Development internet history”
“technology internet”
“future internet”
“future World Wide Web”
“speculation future Internet”
etc.

Since many of the lists I got where too large, I had to be more specific in my searches. Eventually I got to a manageable list of possible sources of information relevant to this paper. I still had decide which of the hits I thought might be useful, as there were too many for me to read them all. Most of the time I spent on this paper was spent “surfing” the Web for information relevant to this topic.

We are in desperate need of tools that will get us away from surfing, where we spend hours meandering around the Internet trying to find what we are looking for. We need tools that help us sort through the data, process it, and present us with information.

The newer search engines that rate the hits they find are a step in the right direction, but the human who entered the search must still take the time to study the links and decide if that data is useful. The subject based search engines, such as Yahoo, often have a brief description of the document, but the human still must go into the document to analyze it further. The search tools of tomorrow must somehow reduce their hits and make those hits more relevant, or summarize the hits in someway, so that the human is not wasting their time wading through data. The tools must present information to the human based on the search criteria.

Besides the problems of search tools that must be addressed, there are a few other important aspects of information that must also be improved. Authors of information will need to come up with new and innovative ways to present their information over the Web, which we are already seeing. In addition to that, since information is time-bound and subject to corrections, the Web will provide an ever changing landscape of information(11). Information providers will no longer sell an encyclopedia set to a person every ten years, but instead will sell on-line licenses, or access to, that encyclopedia, so when changes are made, consumers are not left with outdated versions of the material.

Having information delivered to you

There is a tremendous amount of information being published on the Web today. Below is a brief summary of the many types of resources:

There are many newspapers, such as USA Today and news presented on the Web from television companies, such as CNN. These are national sources, but we are now beginning to see local sources emerge.
There are also many scientific and trade journals electronically available over the Web on every imaginable subject. Click here to see a list of many electronic journals.
There are many magazines that are now offered online. Some of these are offered only electronically, such as Wired Magazine, and others are offered both in a more traditional hard copy format and electronically, such as Time Magazine. For a list of many magazines now available on the internet in many different categories, click here
There are also many reference type materials, such as single language dictionaries, dual language dictionaries (such as an English – Spanish dictionary), thesauri, encyclopedias, and research papers. The Yahoo search site has an entire category dedicated to research resources. Click here to see it: List of Internet References
In the near future, there will be more and more broadcast media, such as video and audio from news sources such as CNN, ESPN, local TV and radio stations, etc.

As is clearly evident from the lists above, there is a tremendous amount of information now available on the Web. However, it is growing more and more impossible for a single person to keep up with everything that they are interested in. It takes too much time to wade through all of these sources to find the few resources that are important to an individual.

What is needed is personalized information services (12). These services will allow users to create profiles of themselves that define the topics that are interesting and important to them. The service will then create a list of new resources based on that profile and forward it to the user. This list could be created as often as the person wants, although most people would probably opt for daily or weekly lists. The user can then read the articles, or view the video, or listen to the audio of those resources that match their profile, instead of trying to find articles in the vast amount of data that is released every day. This is more than a personalized newspaper, but a personalized information service.

How this information is delivered is not too important. Perhaps the simplest way would be to have a personalized home page that is updated daily, weekly, etc., by the service provider. Then new resources that meet a person’s profile can be added to their page, so that they could point and click to view them at their leisure.

Profiles could include such criteria as “College Football,” “Tax Laws in Florida,” “Cholesterol,” etc. Then, when any new resource related to one of those topics is introduced on the WWW, the personal information service will point the user to it.

The drawback to this approach is that people may be limiting the information they see. However, right now, I use my own criteria to decide what I want to read and what I don’t want to read. If I can do that, there should be no problems with customizing a tool that can assist me. This tool may be a few years away, but it will not be impossible.

Also, the tool could offer a way to customize your own home page of information useful to you. If after reading an article from a periodical or viewing the video, should you feel like you may need that information again, you could save that link. As for more permanent sources, such as technical manuals and references, you could also add these links. Then the tool and/or provider could send you notices when there are changes to those more “permanent” resources. In this way, when people no longer buy an encyclopedia that quickly becomes outdated, they pay for a license to an encyclopedia that is updated as often as necessary.

The above information was composed from the following references, available on the WWW:

Conclusions

The Internet, and especially the World Wide Web, represent a world wide distribution of data, that is accessible by anyone with a connection to the Net. The problem is that there is a tremendous amount of data dispersed on the Net, and that the data is not organized an any way. The data is growing at such a fast rate, that it will soon be impossible for a single person to find useful information in a reasonable amount of time. The future of the WWW depends on the development of tools that assist people in sorting through the vases amount of data available to find, and in services that deliver information to people based on their own personal profiles.

References

The references listed after each section are all included here.

2sparrows

The Future of the Web: The Coming of the Information Age

Sean Butler

smbutler@gemini.ibm.com

RIT: ICSA750 – Distributed Computing

March 1996

A Brief History and Overview of the Web

A Brief Overview of the Ttechnology

HyperText Transfer Protocol(HTTP) and HyperText Mark up Language(HTML)

The Future of the WWW

Aspects of the Future

Commerce

Entertainment

Information

Searching for information

Having information delivered to you

Conclusions

References

Leave a comment Cancel reply

Sean Butler

smbutler@gemini.ibm.com

RIT: ICSA750 – Distributed Computing

March 1996

A Brief History and Overview of the Web

A Brief Overview of the Ttechnology

HyperText Transfer Protocol(HTTP) and HyperText Mark up Language(HTML)

The Future of the WWW

Aspects of the Future

Commerce

Entertainment

Information

Searching for information

Having information delivered to you

Conclusions

References

Share this:

Leave a comment Cancel reply