Fundamentals of Web
• Fundamentals of Web: Internet, WWW, Web Browsers and Web Servers, URLs,
MIME, HTTP, Security, The Web Programmers Toolbox. Introduction to
XHTML: Origins and evolution of HTML and XHTML, Basic syntax, Standard
XHTML document structure, Basic text markup, Images, Hypertext Links, Lists,
Tables, Forms, Frames, Syntactic differences between HTML and XHTML.
Internet,
• The Internet is a vast global network of interconnected computers and devices.
• It facilitates the transfer of data through standardized protocols like TCP/IP.
• Services provided via the Internet include:
• Email
• File sharing
• Voice and video communication
• Web access (via the WWW)
• Key Characteristics:
• A physical infrastructure consisting of cables, satellites, routers, and servers.
• Acts as the backbone for services like the WWW, cloud computing, and online
applications.
History of Internet
• In 1960 Advanced Research Projects Agency ( sponsored a conference at
which several dozen ARPA funded graduate students were brought together
at the University of Illinois at Urbana Champaign to meet and share ideas
• During this conference, ARPA rolled out the blueprints for networking the
main computer systems of about a dozen ARPA funded universities and
research institutions They were to be connected with communications lines
operating at a then stunning 56 Kbps (i e 56 000 bits per second) this at a
time when most people (of the few who could) were connecting over
telephone lines to computers at a rate of 110 bits per second
• Paul Baran proposed a distributed network based on data in message blocks
in the early 1960s
• There was great excitement at the conference Researchers at Harvard
talked about communicating with the UNIVAC 1108 “ at the niversity of
Utah to handle calculations related to their computer graphics research
• Many other interesting possibilities were raised Academic research about
to take a giant leap forward Shortly after this conference, ARPA
proceeded to implement the ARPANET, which eventually evolved into
today’s Internet
• Donald Davies conceived of packet switching in 1965 at the National
Physical Laboratory ( and proposed building a national commercial data
network in the UK In 1969 the Department of Defense ( the USA created
a small network of four computers called ARPANET A dvanced R esearch
P rojects A gency N etwork)
• This Network was set up for military purposes
• The Primary goal of ARPANET was to allow multiple users to send and
receive information simultaneously over the communication path
• The network operated with a technique called Packet Switching
• Packet switching is a method of grouping data that is transmitted over a digital network into
packets
• The packets contained an address, error control, and sequencing information
• The address information allowed packets to be routed to their destinations
• The sequencing information helped in reassembling the packets which, because of complex
routing mechanisms, could actually arrive out of order into their original order for the
presentation to the recipient
• Packets from different senders were intermixed on the same lines
• This packet switching technique greatly reduced transmission costs, as compared with the cost
of dedicated communications lines
• The network was designed to operate without centralized control
• If a portion of the network failed, the remaining working portions would still route packets
from senders to receivers over alternative paths for reliability
• The protocol for communicating over the ARPANET became known as the Transmission
Control Protocol(TCP) TCP ensured that messages were properly routed from sender to
receiver and that they arrived intact
WWW
• The WWW, or simply the Web, is a service that runs on the Internet.
• It allows users to access and share hyperlinked content using HTTP/HTTPS
protocols.
• Content on the Web is organized as web pages, which are interconnected
through hyperlinks.
• Key Components:
• Web pages: Documents formatted in HTML, which can include text, images,
videos, and links.
• Hyperlinks: Allow navigation between different web pages.
• Web protocols: HTTP (Hypertext Transfer Protocol) and HTTPS (secured HTTP).
• WWW is the part of the internet that supports multimedia and contains of a
collection of linked documents
• Web is a system of interlinked hypertext documents accessed via the internet
• Web is an application that uses the internet for communications with TCP/IP as
the underlying transport mechanism
• A web is a huge collection of pages of information linked to each other around
the globe
• Web uses the HTTP to transmit the data
• Web pages are stored on web server and send them to a client O9computer as
and when it requests for them
• Internally, a web page is a computer file stored on the disk of the server The file
contains tags written in the codified form These tags decide how the file would
look when displayed on the screen The website address is called as Uniform
Resource Locator (URL)
• A website is a collection of web pages These pages on a website are stored
digitally on a web server
• WWW is a hypertext based system that provides a uniform and a user
friendly interface for accessing the resources on the Internet
• It is an information space in which the items of interest, referred to as
resources, are identified by global identifiers called Uniform Resource
Identifiers (URL)
• The architecture of WWW is two tiered It consists of the client and
the server The client (web browser) requests for a web page This page
is retrieved from the server The architecture depends on three key
standards
• HTML for encoding document content,
• Uniform Resource Locator ( for naming remote information objects in
a global namespace, and
• HTTP for staging the transfer
WebBrowsers
• A Web Browser is a software application used to access the WWW.
• It retrieves web pages from a Web Server and displays them on the user's device.
• Examples: Google Chrome, Mozilla Firefox, Safari, Microsoft Edge, Opera.
• Core Functions:
• URL input: Users enter a URL (Uniform Resource Locator), which directs the
browser to a specific web page.
• Rendering engine: Converts HTML, CSS, and other web content into a visual
format.
• Caching: Stores parts of web pages locally to improve load times.
• Extensions/plugins: Add features like ad-blocking or password management.
• A web browser (commonly referred to as a browser) is a software application for
accessing information on the World Wide Web
• When a user requests a web page from a particular website, the web browser retrieves
the necessary content from a web server and then displays the page on the user's device
• A web browser is not the same thing as a search engine, though the two are often
confused For a user, a search engine is just a website that provides links to other websites
However, to connect to a website's server and display its web pages, a user must have a
web browser installed
• Some of the Frequently used Browsers are
• Google Chrome
• Mozilla Firefox
• Internet Explorer
• Microsoft Edge
• Opera
• Netscape navigator
• Safari
• Function
• The purpose of a web browser is to fetch information resources from the Web
and display them on a user's device
• This process begins when the user inputs a Uniform Resource Locator (URL)
such as https :://www apcas in/ into the browser
• Virtually all URLs on the Web start with either http or https which means the
browser will retrieve them with the Hypertext Transfer Protocol (HTP)
• In the case of https the communication between the browser and the web
server is encrypted for the purposes of security and privacy
• Web browsers can typically be configured with a built in menu Depending on
the browser, the menu may be named Settings, Options, or Preferences
• The menu has different types of settings For example, users can change their
home page and default search engine They also can change default web page
colors and fonts Various network connectivity and privacy settings are also
usually available
Web Servers,
• A Web Server is a system (hardware or software) that stores and delivers web content to
clients (browsers) over the Internet.
• When you request a web page by entering a URL in a browser, the browser sends this
request to a web server, which then sends back the requested files.
• Key Functions:
• Hosting websites: Storing web pages, scripts, and multimedia files.
• Handling requests: Processing HTTP/HTTPS requests from browsers.
• Dynamic content generation: Often integrates with databases and server-side scripts to
create personalized web content.
• Examples of Web Servers:
• Apache HTTP Server
• Nginx
• Microsoft Internet Information Services (IIS)
• A Web Server is a computer that is dedicated to provide web services to clients
on the internet
• A Web server is a dedicated computer that uses HTTP (Hypertext Transfer
Protocol) and other protocols to respond to client requests made over the World
Wide Web
• The main job of a web server is to display website content through storing,
processing and delivering webpages to users
• Basically web server is used to host the web sites but there exists other web
servers also such as gaming, storage, FTP, email etc
• Apache HTTP Server This is the most popular web server in the world developed
by the Apache Software Foundation
• Apache web server is an open source software and can be installed on almost all
operating systems including Linux, Unix, Windows, FreeBSD, Mac OS X and more
• About 60 of the web server machines run the Apache Web Server
Uniform Resource Locator (URL)
• Every Web page has a unique address called a URL (Uniform Resource
Locator) which identifies where it is located on the Web
• For example, the URL for APCAS home page is http :://www
apcas.in/home/
• The basic parts of a URL often provide " to where a web page originates
and who might be responsible for the information at that page or site
• URLs have three basic parts the protocol, the server name and the
resource ID
• Look again at APCAS's URL below to see these three parts
• The protocol is shown at the beginning of the URL before the double
slash the server name is between the double slash and the first single
slash and the resource id is everything after the first single slash(/)
• http :://www apcas in/home/ protocol
• |Server Name |resource id
• Let's examine each part of this URL First part protocol (http
• The protocol identifies the method (set of rules) by which the
resource is transmitted All Web pages use Hypertext Transfer Protocol
( Thus, all web URL's begin with http
• Second Part Server Name
• The server name identifies the computer on which the resource is found
• This part of the URL commonly identifies which company, agency or organization may be
either directly responsible for the information, or is simply providing the computer space
where the information is stored
• Web server names often begin with the letters www, but not always
• The server name always ends with a dot and a three letter or two letter extension called the
domain name
• The domain is important because it usually identifies the type of organization that created
or sponsored the resource Sometimes it indicates the country where the server is located
• The most common domain names are
• Com:- which identifies company or commercial sites
• Org:- for non profit organization sites
• Edu:- for educational sites
• Gov:- for government sites
• Net:- for Internet service providers or other types of networks
• If the domain name is two letters, it identifies a country, e g
• us for the United States,
• uk for the United Kingdom,
• au for Australia,
• mx for Mexico or
• ca for Canada
• The server name for our college website is www apcas in The server
name may also be the name of a website
• Websites can be either all of the pages on one server(computer) or all
of the pages under a specific sub directory on a server
Hypertext Transfer Protocol (HTTP)
• Hypertext Transfer Protocol(HTTP) is an internet communication protocol
used to send and receive webpages and files on the internet
• HTTP defines how messages are formatted and transmitted and what action
web browsers should take in response to various commands
• for e g when the user enters a URL in a browser, the browser sends and HTTP
command to the web server directing it to fetch and transmit the requested
web page
• HTTP is a connectionless text based protocol Clients (web browsers) send
requests to web servers for web elements such as web pages and images
• After the request is serviced by a server, the connection between client and
server across the Internet is disconnected
• A new connection must be made for each request
• The server must be located using a URL or URI This always contains
http at the start
• It normally connects to port 80 on a computer
• A more secure version of HTTP is called HTTPS (Hypertext Transfer
Protocol Secure)
• This contains https at the beginning of the URL It encrypts all the
information that is sent and received
• This can stop malicious users such as hackers from stealing the
information and is often used on payment websites
• HTTPS uses port 443 for communication instead of port 80
Multipurpose Internet Mail
Extension(MIME)
• Multipurpose Internet Mail Extension ( is a standard that was proposed
by Bell Communications in 1991 in order to expand the limited
capabilities of email
• MIME is a kind of addon or a supplementary protocol that allows non
ASCII data to be sent through SMTP It allows the users to exchange
different kinds of data files on the Internet audio, video, images,
application programs as well
• Email messages with MIME formatting are typically transmitted with
standard protocols, such as the Simple Mail Transfer Protocol (SMTP)
the Post Office Protocol (POP) and the Internet Message Access
Protocol (IMAP)
Need for MIME
• Limitations of Simple Mail Transfer Protocol (SMTP)A
• SMTP has a very simple structure
• Its simplicity however comes with a price as it only sends messages in
Network Virtual Terminal (NTV) 7 bit ASCII format
• It cannot be used for languages that do not support 7 bit ASCII format
such as French, German, Russian, Chinese and Japanese, etc so it
cannot be transmitted using SMTP So, in order to make SMTP more
broad, we use MIME
• It cannot be used to send binary files or video or audio data
Web Security
Web application security is a branch of information security that deals
specifically with security of websites, web applications and web services. At a high
level, web application security draws on the principles of application security
but applies them specifically to internet and web systems.
The global nature of the Internet exposes web properties to attack from
different locations and various levels of scale and complexity. Web application
security deals specifically with the security surrounding websites, web
applications and web services such as APIs.
Let’s explore some of the common methods of attack or
• “vectors” commonly exploited.
•Cross site scripting (XSS) - XSS is a vulnerability that allows an attacker to inject
client-side scripts into a webpage in order to access important information directly,
impersonate the user, or trick the user into revealing important information.
•SQL injection (SQi) - SQi is a method by which an attacker exploits vulnerabilities in
the way a database executes search queries. Attackers use SQi to gain access to
unauthorized information, modify or create new user permissions, or otherwise
manipulate or destroy sensitive data.
Memory corruption - Memory corruption occurs when a location in memory is unintentionally
modified, resulting in the potential for unexpected behavior in the software. Bad actors will attempt
to sniff out and exploit memory corruption through exploits such as code injections or buffer
overflow attacks.
Buffer overflow - Buffer overflow is an anomaly that occurs when software writing data to a defined
space in memory known as a buffer. Overflowing the buffer’s capacity results in adjacent memory
locations being overwritten with data. This behavior can be exploited to inject malicious code into
memory, potentially creating a vulnerability in the targeted machine.
Cross-site request forgery (CSRF) - Cross site request forgery involves tricking a victim into making a
request that utilizes their authentication or authorization. By leveraging the account privileges of a
user, an attacker is able to send a request masquerading as the user. Once a user’s account has been
compromised, the attacker can exfiltrate, destroy or modify important information. Highly
privileged accounts such as administrators or executives are commonly targeted.
Data breach - A data breach is the intentional or unintentional release of secure or
private/confidential information to an untrusted environment. Data breaches may involve personal
health information (PHI), personally identifiable information (PII), trade secrets or intellectual
property.
The web programmers toolbox
• Web development tools are applications and software that helps web developers
program, test, and debug code and interface elements to build a website or web
application. These tools don’t actually build the website or web application, instead,
they help make web development easier.
• A web programmer's toolbox includes tools for:
• Code editing: Tools like Visual Studio Code are free and can be used for web design
• Source control: Tools for managing source code
• Testing and debugging: Tools for testing and debugging code
• Documentation: Tools for creating documentation
• Networking and API: Tools for working with networks and APIs
• Project management: Tools for managing projects
• Some other tools that web developers use include: Sublime Text, Bootstrap, GitHub,
and Chrome Dev Tools
• When choosing tools, it's important to find a set that
meets the project's requirements and reduces
development time. For example, Eclipse with extensions
from Google might be a good choice for Android app
development.
• Document languages and programming languages that are
the building blocks of the web and web programming
• XHTML
• Plug-ins
• Filters
• XML
• Javascript
• Java, Perl, Ruby, PHP
XHTML
• XHTML or EXtensible HyperText Markup Language is a mix of HTML
and XML, very similar to HTML but stricter. It’s like a rulebook for creating
web pages that browsers easily understand. Unlike HTML, you have to be
careful and follow the rules exactly. Most browsers support it. Just think of
it as a more precise way to write web code.
• History
• It was developed by the World Wide Web Consortium (W3C) and helps web
developers transition from HTML to XML. With XHTML, developers can
enter the XML world with all its features while still ensuring backward and
future compatibility of the content. The XHTML family includes three
document types; the first is XHTML 1.0, which was recommended by W3C
on January 26, 2000. The second is XHTML 1.1, which was recommended
by W3C on May 31, 2001.
• The third is XHTML5, a standard used for developing an XML adaptation of
the HTML5 specification. An XHTML document must have an XHTML <!
DOCTYPE> declaration.
Elements of XHTML:
XHTML Element Description
Used to declare the Document Type Definition (DTD),
<!DOCTYPE> specifying the rules for the markup language,
ensuring proper rendering in browsers.
<html> Encloses the entire HTML or XHTML document,
serving as the root element.
Contains meta-information about the document, such
<head> as the title, character set, linked stylesheets, and
other essential elements.
Nested within the head section, specifies the title of
<title> the document, displayed in the browser’s title bar or
tab.
Encloses the content of the web page, including text,
<body> images, links, and other HTML elements. It represents
the visible part of the document displayed in the
browser.
Feature HTML XHTML
Definition Hypertext Markup Language is a markup eXtensible Hypertext Markup Language is a
language used to create web pages and markup language that is a stricter version
other information that can be displayed in of HTML and conforms to XML syntax.
a web browser.
Syntax HTML allows for loose syntax, with end XHTML requires end tags for all elements
tags and attributes often being optional. and attributes to be quoted.
Document Type Declaration (DTD) HTML allows for multiple DTDs, including XHTML requires the use of a specific DTD,
HTML 4.01 and HTML5. such as XHTML 1.0 Strict or XHTML 1.1.
Namespaces HTML does not support namespaces. XHTML supports namespaces, allowing for
the integration of other XML languages.
Attributes HTML allows for the use of deprecated XHTML does not allow the use of
attributes. deprecated attributes and requires all
attributes to be lowercase.
Deprecation HTML will continue to be supported by XHTML support by web browsers is limited
web browsers. and it is now largely replaced by HTML5.
Future HTML continues to evolve, with the latest XHTML development has largely been
version being HTML5. discontinued, with future developments
focusing on HTML5.
Key Differences Between HTML and XHTML
•Syntax: XHTML has a stricter syntax than HTML, meaning that it
must follow XML rules for proper formatting and structure. HTML,
on the other hand, is more flexible in its syntax.
•Document Type Definition (DTD): XHTML requires a DTD to be
specified, which defines the rules for the structure of the
document. HTML does not require a DTD.
•Case sensitivity: XHTML is case sensitive, meaning that
elements and attributes must be in lower case. HTML is not case
sensitive.
•Empty Elements: In XHTML, all empty elements must be closed,
such as <br /> or <img src="image.jpg" alt="image" />. In HTML,
some empty elements can be left open, such as <br> or <img
src="image.jpg" alt="image">.
•Attribute values: In XHTML, all attribute values must be quoted,
while in HTML they can be either quoted or unquoted.
•Error handling: XHTML has more strict error handling, with
errors resulting in the page not being displayed properly. HTML is