Uniform Resource Locator shortened as URL is used to identify the location of a file or a resource on the internet. We use URLs to open websites, images, videos, applications, and other programs that are hosted n the internet.
If you want to open a file on your computer, then you would navigate to its location and double click it. Similarly, if you’re going to open a file on a remote server, then you would have to specify the location of the file, this is where URLs help us.
All one has to do is enter the URL of the resource in the navigation bar of your web browser.
It looks through the internet, and the browser loads the file.
Though this is what we see when we enter a URL or click on one, there are a lot more steps and processes that go on behind the scenes.
Structure of a URL
Before you begin to understand how a URL works, it is essential to know the structure of a URL.
A URL can get partitioned into many sections. Each of these partitions has a specific role when it comes to retrieving a file. The major sections of a URL are protocol, the hostname, and the file or resource location.
Usually, the structure of a URL looks like this: Protocol://hostname/location
Protocols are a set or a group of rules that allow any electronic devices to communicate with one another. These rules specify what type of data can be transmitted, how they are transmitted, and the commands used to transfer data.
Host Name or Address
The name of the machine or the domain name of the server in which the resource resides.
Resource or File Location
The resource location is used to denote the full address to the file concerning the host address.
What happens behind the screen?
Now that you know the basic structure of a URL, let’s understand what happens when a URL is clicked or entered in the browser.
1. The browser receives the URL
There are two main ways you can provide a URL to the web browser. One of them is to type the URL down in the navigation bar of the web browser. Another way is to click on a link which redirects us to another resource.
Let us assume that we are trying to access Google Drive (drive.google.com) to help us better understand how URL processing works.
2. The browser checks the cache to find a corresponding IP address
The web browser checks the cache for a DNS (Domain Name System) record to find the IP address of the URL you entered. DNS is a database that keeps on track the website name along with the IP address.
You can access the websites by typing the IP address in URL too. For example, the IP address of Google is 126.96.36.199. So you can also access Google by typing https:// 188.8.131.52 in the navigation bar.
The browser goes through four caches to check for the DNS record.
- Browser Cache: First it checks the browser’s cache. Every browser stores the DNS record for all the websites visited for some amount of time unless the user specifies otherwise.
- OS Cache: The OS that you use also stores the DNS records. So if your browser can’t find the record in the browser’s cache, then the next step is to check in the OS cache.
- Router Cache: If the browser and the OS cache can’t find the record, the browser communicates with the router as it too has its cache.
- ISP Cache: The ISP (Internet Service Provider) cache is the last hope for the browser to find the DNS record. If everything else fails, it can retrieve the records from ISP cache.
3. Brower checks other DNS servers for the IP address
If the URL is not available in the ISP’s cache, then the ISP server initiates a DNS query to the multiple DNS servers on the internet. It checks each server one by one recursively until it finds the IP address, or it will return an error message.
This type of continuous search is called the recursive search. In such a condition we’ll call the ISP’s DNS server as the DNS recursor since it is responsible for finding the corresponding IP address of the URL.
Many websites that we use today have third-level, second-level, and top-level domains. Each of these domains has its own DNS server which is looked up to find the IP address.
In our example, the DNS recursor will query the root domain first. From there it will be redirected to .com DNS server. Then it will be redirected to google.com DNS server.
The google.com DNS server will find a match for drive.google.com in its records and will return the IP address to the DNS recursor which then returns the IP address to the browser.
4. The browser opens a TCP connection with the server
After the browser receives the correct IP address, it will establish a connection with the server using communication protocols. Though there are many protocols available, HTTP requests prefer TCP (Transmission Control Protocol).
It is essential to set up a connection to transfer data packets. This connection is set up by a method called the TCP/ IP three-way handshake.
- The client machine or the user’s machine sends an SYN (Synchronize) packet to the server machine asking it if it’s open for a new connection.
- If the server has free ports, then it will send an acknowledgment using the SYN/ ACK (Acknowledgment) packet.
- When the client machine receives the SYN/ ACK packet, it resends an ACK packet to acknowledge that it received the packet the server sent. Then a TCP connection is opened for data transformation.
5. The server receives an HTTP request from the browser
After the connection gets established, we can immediately start transferring the data. The browser will send out a GET request for drive.google.com.
On the other hand, if you click on some credentials or if you are submitting a form then it sends a POST request to the server.
The request contains additional information like the cookies, the browser used (User- Agent headers), the type of request it accepts (Accept header) and it can also ask the server to keep the connection open for additional requests (Connection headers).
6. The request is processed, and the server sends an HTTP response
Web servers like Apache and IIS are used to read requests and the control transfers from the request to the request handler.
Programs written using languages like ASP.NET and PHP reads a request, the cookies and the additional information passed is called as the request handler.
It then assembles the response in JSON, HTML, or XML format. The response contains status code, Cache-control, information on the additional cookies to set and privacy information along with the page that was requested.
7. The browser displays the response
The browser then displays the response. The most common response format is HTML. The response is posted in phases by the browser.
The static files are cached onto a particular location so that the browser can easily retrieve when it is needed some other time. After which you can see Google Drive opening on your browser.
URL the abbreviation for Uniform Resource Locator is a reference or an address to a file or a resource on the internet.
In the past, it was difficult to remember URL because one had to know the server’s IP address to access the files from the machine. But now the IP addresses are replaced by a set of names that are easy to remember.
A DNS server performs the conversion of the domain to the server’s IP address. After which it establishes a connection with the server with the help of the IP address.
The server receives a request from the browser, processes the request, and then resends the response back to the browser.
Although this sounds like a long and tedious process, it takes less than a second in real time. When we click enter on the navigation bar or the when we click on a link, all these steps happen in mere milliseconds and too fast for us to notice.