Unit 1 introduction to web programming

A BRIEF INTRODUCTION ABOUT THE INTERNET (Origins) 1960s  U.S. Department of Defence (DoD) became interested in developing a new large-scale computer network.  The purposes of this network were communications, program sharing, and remote computer access for researchers working on defence-related contracts.  The DoD’s Advanced Research Projects Agency (ARPA) funded the construction of the first such network. Hence it was named as ARPAnet.  The primary early use of ARPAnet was simple text-based communications through e-mail.

Late 1970s and early 1980s  BITNET, which is an acronym for Because It’s Time NETwork, began at the City University of New York. It was built initially to provide electronic mail and file transfers.  CSNET is an acronym for Computer Science NETwork. Its initial purpose was to provide electronic mail. 1990s  NSFnet which was created in 1986 replaced ARPAnet by 1990.  It was sponsored by the National Science Foundation (NSF).  By 1992 NSFnet, connected more than 1 million computers around the world.  In 1995, a small part of NSFnet returned to being a research network. The rest became known as the “Internet”.

What Is the Internet?  The Internet is a huge collection of computers connected in a communications network.  The Transmission Control Protocol/Internet Protocol (TCP/IP) became the standard for computer network connections in 1982.  Rather than connecting every computer on the Internet directly to every other computer on the Internet, normally the individual computers in an organization are connected to each other in a local network. One node on this local network is physically connected to the Internet.  So, the Internet is actually a network of networks, rather than a network of computers.  Obviously, all devices connected to the Internet must be uniquely identifiable.

Internet Protocol Addresses  The Internet Protocol (IP) address of a machine connected to the Internet is a unique 32-bit number.  IP addresses usually are written (and thought of) as four 8- bit numbers, separated by periods.  The four parts are separately used by Internet-routing computers to decide where a message must go next to get to its destination.  Although people nearly always type domain names into their browsers, the IP works just as well.  For example, the IP for United Airlines (www.ual.com) is 209.87.113.93. So, if a browser is pointed at http://209.87.113.93, it will be connected to the United Airlines Web site.

Domain Names  The IP addresses are numbers. Hence, it would be difficult for the users to remember IP address. To solve this problem, text based names were introduced. These are technically known as domain name system (DNS).  These names begin with the names of the host machine, followed by progressively larger enclosing collection of machines, called domains. There may be two, three or more domain names.  DNS is of the form hostname.domainName.domainName  Example: movies.comedy.marxbros.com

 Here, “movies” is the host name and “comedy” is domain name which is a part of “marxbro’s” domain, which is a part of the “com” domain. The host name and all of the domain names are together called “fully qualified domain name”.  The steps for conversion from DNS to IP:  The DNS has to be converted to IP address before destination is reached.  This conversion is needed because computer understands only numbers.  The conversion is done with the help of name server.  As soon as domain name is provided, it will be sent across the internet to contact name servers.  This name server is responsible for converting domain name to IP  If one of the name servers is not able to convert DNS to IP, it contacts other name server.  This process continues until IP address is generated.  Once the IP address is generated, the host can be accessed.

The World Wide Web (Origins)  Tim Berners Lee and his group proposed a new protocol for the Internet whose intention was to allow scientists around the world to use the Internet to exchange documents describing their work.  The proposed new system was designed to allow a user anywhere on the Internet to search for and retrieve documents from the databases on any number of different document- serving computers.  The system used hypertext, which is text with embedded links to text in other documents to allow non- sequential browsing of textual material.  The units of web are referred as pages, documents and resources.  Web is merely a vast collection of documents, some of which are connected by links.  These documents can be accessed by web browsers and are provided by web servers.

Web or Internet?  It is important to understand that the Internet and the Web is not the same thing.  The Internet is a collection of computers and other devices connected by equipment that allows them to communicate with each other.  The Web is a collection of software and protocols that has been installed on most, if not all, of the computers on the Internet.

WEB BROWSERS  Documents provided by servers on the Web are requested by browsers, which are programs running on client machines.  They are called browsers because they allow the user to browse the resources available on servers.  Mosaic was the first browser with a graphical user interface.  A browser is a client on the Web because it initiates the communication with a server, which waits for a request from the client before doing anything.  In the simplest case, a browser requests a static document from a server.  The server locates the document among its servable documents and sends it to the browser, which displays it for the user.  Sometimes a browser directly requests the execution of a program stored on the server. The output of the program is then returned to the browser.  Examples: Internet Explorer, Mozilla Firefox, Netscape Navigator, Google Chrome, Opera etc.,

WEB SERVERS  Web servers are programs that provide documents to requesting browsers. Example: Apache Web server operations:  All the communications between a web client and a web server use the HTTP  When a web server begins execution, it informs the OS under which it is running & it runs as a background process  A web client or browser, opens a network connection to a web server, sends information requests and possibly data to the server, receives information from the server and closes the connection.  The primary task of web server is to monitor a communication port on host machine, accept HTTP commands through that port and perform the operations specified by the commands.  When the URL is received, it is translated into either a filename or a program name.

General characteristics of web server:  The file structure of a web server has two separate directories  The root of one of these is called document root which stores web documents  The root of the other directory is called the server root which stores server and its support softwares  The files stored directly in the document root are those available to clients through top level URLs  The secondary areas from which documents can be served are called virtual document trees.  Many servers can support more than one site on a computer, potentially reducing the cost of each site and making their maintenance more convenient. Such secondary hosts are called virtual hosts.  Some servers can serve documents that are in the document root of other machines on the web; in this case they are called as proxy servers

UNIFORM RESOURCE LOCATORS  Uniform Resource Locators (URLs) are used to identify different kinds of resources on Internet.  If the web browser wants some document from web server, just giving domain name is not sufficient because domain name can only be used for locating the server.  It does not have information about which document client needs. Therefore, URL should be provided. The general format of URL is: scheme: object-address  Example: http: //www.dte.kar.nic.in/results.php  The scheme indicates protocols being used. (http, ftp, telnet.file..)  In case of http, the full form of the object address of a URL is as follows:  //fully-qualified-domain-name/path-to-document

 URLs can never have embedded spaces  It cannot use special characters like semicolons, ampersands and colons  The path to the document for http protocol is a sequence of directory names and a filename, all separated by forward or backward slashes.  The path in a URL can differ from a path to a file because a URL need not include all directories on the path  A path that includes all directories along the way is called a complete path.  Example: http://www.dte.kar.nic.in/index.html  In most cases, the path to the document is relative to some base path that is specified in the configuration files of the server. Such paths are called partial paths.  Example: http://www.rnsit.ac.in/

MULTIPURPOSE INTERNET MAIL EXTENSIONS  MIME stands for Multipurpose Internet Mail Extension.  Apart from sending the requested document The server system will also send MIME information.  The MIME information is used by web browser for rendering the document properly.  The format of MIME is: type/subtype  Example: text/html , text/doc , image/jpeg , video/mpeg  When the type is either text or image, the browser renders the document without any problem

 However, if the type is video or audio, it cannot render the document  It has to take the help of other software like media player, win amp etc.,  These software are called as helper applications or plugins  These non-textual information are known as HYPER MEDIA  type/x-subtype  Experimental document types are used when user wants to create a customized information & make it available in the internet The format of experimental document type is:  Example: database/x-xbase , video/x-msvideo

THE HYPERTEXT TRANSFER PROTOCOL Request Phase:  The general form of an HTTP request is as follows: 1. HTTP method Domain part of the URL HTTP version 2. Header fields 3. Blank line 4. Message body  The following is an example of the first line of an HTTP request:  GET /storefront.html HTTP/1.1

Table 1.1 HTTP request methods

 GET and POST are the most frequently used. After the first line any number of header fields can be included. The format of a header field is the field name followed by a colon and the value of the field. There are four categories of header fields:  General: For general information, such as the date  Request: Included in request headers  Response: For response headers  Entity: Used in both request and response headers  One common request field is Accept field which specifies the MIME type. For example,  Accept: text/plain  Accept: image/gif  Accept: text/html

 A wildcard character, the asterisk (*), can be used to specify that part of a MIME type can be anything.  The Host: hostname request field gives the name of the host. The Host field is required for HTTP 1.1. The If-Modified- Since: date request field specifies that the requested file should be sent only if it has been modified since the given date.  If the request has a body, the length of that body must be given with a Content- length field. The header of a request must be followed by a blank line, which is used to separate the header from the body of the request.

The Response Phase:  The general form of an HTTP response is as follows:  Status line  Response header fields  Blank line  Response body  The status line includes the HTTP version used, a three-digit status code for the response, and a short textual explanation of the status code. For example, most responses begin with the following:  HTTP/1.1 200 OK

 The status codes begin with 1, 2, 3, 4, or 5. The general meanings of the five categories specified by these first digits are shown in Table 1.2.  Table 1.2 First digits of HTTP status codes

 One of the more common status codes is one user never want to see: 404 Not Found, which means the requested file could not be found.  The code 200 means the request handled without error.  The code 500 means the server encountered a problem and not able to fulfill the request.  After the status line, the server sends a response header, which contains several lines of information about the response. The essential is field of the header is “content- type”.  The response header must be followed by a blank line and then response data follows the blank line.

SECURITY  Security is one of the major concerns in the Internet. The server system can be accessed easily with basic hardware support, internet connection & web browser. The client can retrieve very important information from the server. Similarly, the server system can introduce virus on the client system. These viruses can destroy the hardware and software in client.  While programming the web, following requirements should be considered:  Privacy: it means message should be readable only to communicating parties and not to intruder.  Integrity: it means message should not be modified during transmission.

 Authentication: it means communicating parties must be able to know each other’s identity  Non-repudiation: it means that it should be possible to prove that message was sent and received properly  Security can be provided using cryptographic algorithm. Ex: private key, public key  Protection against viruses and worms is provided by antivirus software, which must be updated frequently so that it can detect and protect against the continuous stream of new viruses and worms.

THE WEB PROGRAMMER’S TOOLBOX  Web programmers use several languages to create the documents that servers can provide to browsers.  The most basic of these is XHTML, the standard mark- up language for describing how Web documents should be presented by browsers. Tools that can be used without specific knowledge of XHTML are available to create XHTML documents.  A plug-in is a program that can be integrated with a word processor to make it possible to use the word processor to create XHTML. A filter converts a document written in some other format to XHTML.

 XML is a meta-mark-up language that provides a standard way to define new mark-up languages. JavaScript is a client-side scripting language that can be embedded in XHTML to describe simple computations. JavaScript code is interpreted by the browser on the client machine; it provides access to the elements of an XHTML document, as well as the ability to change those elements dynamically.  Flash is a framework for building animation into XHTML documents. A browser must have a Flash player plug-in to be able to display the movies created with the Flash framework.  Ajax is an approach to building Web applications in which partial document requests are handled asynchronously. Ajax can significantly increase the speed of user interactions, so it is most useful for building systems that have frequent interactions.

 PHP is the server-side equivalent of JavaScript. It is an interpreted language whose code is embedded in XHTML documents. PHP is used primarily for form processing and database access from browsers.  Servlets are server-side Java programs that are used for form processing, database access, or building dynamic documents. JSP documents, which are translated into servlets, are an alternative approach to building these applications. JSF is a development framework for specifying forms and their processing in JSP documents.  ASP.NET is a Web development framework. The code used in ASP.NET documents, which is executed on the server, can be written in any .NET programming language.

 Ruby is a relatively recent object-oriented scripting language that is introduced here primarily because of its use in Rails, a Web applications framework.  Rails provides a significant part of the code required to build Web applications that access databases, allowing the developer to spend his or her time on the specifics of the application without the drudgery of dealing with all of the housekeeping details.  PHP is the server-side equivalent of JavaScript. It is an interpreted language whose code is embedded in XHTML documents. PHP is used primarily for form processing and database access from browsers.

HTML XHTML HTML is much easier to write XHTML requires a level of discipline many of us naturally resist huge number of HTML documents available on the Web, browsers will continue to support HTML as far as one can see into the future. some older browsers have problems with some parts of XHTML. HTML has few syntactic rules, and HTML processors (e.g., browsers) do not enforce the rules it does have. Therefore, HTML authors have a high degree of freedom to use their own syntactic preferences to create documents. Because of this freedom, HTML documents lack consistency, both in low-level syntax and in overall structure. XHTML has strict syntactic rules that impose a consistent structure on all XHTML documents. Another significant reason for using XHTML is that when you create an XHTML document, its syntactic correctness can be checked, either by an XML browser or by a validation tool Used for displaying the data Used for describing the data

Unit 1 introduction to web programming

More Related Content

What's hot

Similar to Unit 1 introduction to web programming

More from zahid7578

Recently uploaded

Unit 1 introduction to web programming