¿What happens when visiting a website?
Have you ever wondered ¿How does internet works? ¿What happens when you visit a website? ¿How is the information sent and received online?
We’ll explain the Internet structure (and protocols) in a detailed yet easy to understand manner. 😊
Internet and IP addresses
Internet is a collection of hundreds of miles long-underwater cables, ISP (Internet Service Providers), infrastructure, IEPs (Internet Exchange Points), and Datacenters.
Those cables allow us to send and receive data over long distances.
Datacenters are a collection of servers (interconnected computers) that host Websites, Apps and more.
“Outsourcing IT to a professional data center makes organizations more resilient to such equipment problems, rising operational costs and malicious parties. Data centers are optimized for accommodating this IT. They have cooling, connectivity, security and links with, for example, cloud providers that are increasingly difficult to realize within the own organization.” Taken from analyticsindiamag.
Likewise, servers are usually configured in a way to avoid Single Point of Failure (SPOF). There have to be redundant components of the web infrastructure at all points, to avoid failure in case one Server, Load Balancer or Data Base misfunction.
…
DNS - DNS Request
As humans, our brain remembers “google.com” better than “8.8.8.8.8”. In this example, Google is the domain name and 8.8.8.8.8 is its IP address.
DNS (Domain Name Server) are specialized servers that host DNS records. These records provide information about a domain, including its IP address and domain name.
The process of connecting to a website starts with the browser looking for the IP address on:
- Its cache
- If not available, it will request it to the OS
- If not available, it will request it to the Router’s cache
- If not available, it will request it to a DNS
This process is made, so the connection is faster if the website is commonly visited.
Google is the most visited website in the world and the IP will probably be in the browser’s cache, therefore it will not have to connect to an DSP and look for the records.
…
OSI Model
“OSI stands for Open Systems Interconnection. It has been developed by ISO — ‘International Organization for Standardization‘, in the year 1984. It is a 7 layer architecture with each layer having specific functionality to perform. All these 7 layers work collaboratively to transmit the data from one person to another across the globe.” Taken from GeeksforGeeks.
As mentioned, all devices with access to the internet have to follow the OSI model as a standard to ensure connectivity. It includes the computer trying to connect to Google and Google’s servers (Client and Server model).
…
TCP
TCP (Transmission Control Protocol) is part of the Transport layer in the OSI model. It defines how information moves between Client and Server.
“First, application programs send messages or streams of data to one of the Internet Transport Layer Protocols, either the User Datagram Protocol (UDP) or the Transmission Control Protocol (TCP). These protocols receive the data from the application, divide it into smaller pieces called packets, add a destination address, and then pass the packets along to the next protocol layer, the Internet Network layer.
The Internet Network layer encloses the packet in an Internet Protocol (IP) datagram, puts in the datagram header and trailer, decides where to send the datagram (either directly to a destination or else to a gateway), and passes the datagram on to the Network Interface layer.
The Network Interface layer accepts IP datagrams and transmits them as frames over a specific network hardware, such as Ethernet or Token-Ring networks.” Taken from IBM.
…
HTTP — HTTPS/SSH
“Hypertext Transfer Protocol (HTTP) is an application-layer protocol for transmitting hypermedia documents, such as HTML. It was designed for communication between web browsers and web servers, but it can also be used for other purposes. HTTP follows a classical client-server model, with a client opening a connection to make a request, then waiting until it receives a response.” Taken from developer.mozilla.org.
Up to this point, the browser found Google’s IP address on its cache, and opens a TCP connection to send an HTTP request.
HTTP is readable and structured with relevant information to connect to the server.
The browser interprets the response sent by the server and the information in its body to render the HTML, CSS and JavaScript files in google.com.
HTTPS is a secure and advanced version of HTTP by encrypting the communication with SSL, so the communication is not modified or listened to.
…
Load Balancer
When connecting to a server, the first point of contact usually is the Load Balancer
A Load balancer distributes the workload between different servers or applications. It optimizes the overall performance of the infrastructure, as well as its performance and capacity.
It’s specially important because it needs to handle incoming requests during peak visit windows and redirect them to multiple servers based on their capacity.
Therefore, Google’s Load Balancer will redirect us to one of their servers.
…
Firewall
“A firewall is a device used in network security to monitor incoming and outgoing network traffic and determine whether to allow or block it based on a predetermined set of security rules.
The purpose of a firewall is to reduce or eliminate unwanted network connections and increase the free flow of legitimate traffic.” Taken from liquidweb
A firewall is necessary to provide with privacy and security of data.
After our HTTP request is redirected by a Load Balancer, and pass through the Firewall, it's finally time to reach the server.
…
Web Server
It could mean software with or without hardware. It attends HTTPS requests and gives response by serving static content such as HTML files, CSS, images, videos, etc. This information is hosted in the server and delivered in the HTTP body’s response, and interpreted by the client’s browser.
Besides HTTP, web servers support SMTP (Simple Mail Transfer Protocol) and FTP (File Transfer Protocol), used for email, file transferring, and storage.
…
Application Server
“When application users, be it staff or web clients, request access to an application, the application server often does the heavy lifting on the backend to store and process dynamic application requests.
Application servers physically or virtually sit between database servers storing application data and web servers communicating with clients.” Taken from serverwatch.
Instead of serving static content, App Servers have the power to serve content dynamically (interacting with the Data Base), provide the environment and run applications.
…
Database
A database is a collection of data stored in a system. There are many kinds of databases, but the most common are relational (MySQL) and no-relational (MongoDB).
It interacts with the Application server creating, reading, updating, and deleting information (CRUD).
Databases usually handle information in tables, columns and rows for easier access. Also, it handles different permissions and access. For example: Sysadmins typically have a deeper control and access to databases than outside users.
…
In Conclusion.
Internet is a fascination world full of technologies that enable our lives. ✌️
Let’s look at the whole web infrastructure mentioned above:
Bibliography
- https://www.cloudflare.com/learning/ddos/glossary/open-systems-interconnection-model-osi/
- https://en.wikipedia.org/wiki/OSI_model
- https://www.ibm.com/docs/en/aix/7.1?topic=protocol-tcpip-protocols
- https://avinetworks.com/glossary/single-point-of-failure/
- https://www.liquidweb.com/blog/what-is-a-firewall/#what-is-a-firewall
- https://www.serverwatch.com/guides/application-server/