Contents

What Happens When You Click a Link? URL/ HTTP/ Domain Name/ DNS/ IP Address

URL

What happens when we hit any URL?

Before discussing what happens after hitting the URL, we must go through what a URL actually is, and what different parts of the URL mean - right? Without wasting any time, let’s understand more about URLs.

URL – Uniform Resource Locator

If you look into its full form, then it is self explanatory: it has the location of the resources which we want to access. It is an address of the place where we want to go to interact with or find information.

The web site is like the house and the URL (Uniform Resource Locator) is the address.

Anatomy of an URL

Let’s take an example:

https://chienhsiang-hung.github.io/blog/

Here, the first part is https. This basically tells the browser which protocol it should use. It can be http, https, ftp, etc. A protocol is a set of rules that browser use for communication over the network. https is basically a secure version, i.e. information is exchanged in a secure way.

The second part chienhsiang-hung.github.io is a domain name. You can relate it to your friend’s house. It is an address of website. We use it to reach to the server (trained computer) that is responsible for serving the information for that website. Wait! You might think, a seconds before I mentioned URL is the address whereas I also mentioned domain name is also address. You may have been confused. Don’t be confused!

URL vs. Domain Name

Difference between URL and Domain Name

The major difference between both is that the URL is a complete address. URL tells about the method through which information should exchange, the path after reaching that website. Whereas the domain name is part of a URL.

Domain Name

As we know, when we hit any URL or you can say domain name, then that website gets opened with its content. A server (a trained computer) serves it. We also know that every computer has an IP address which is used for communication over the internet. It is an address as its self explaining ‘IP address’. When we hit any URL, then we are actually hitting the IP address of the computer which is responsible for serving the website content (hosting).

But, now, you might think what the hell…is everything an address? Why does this domain name exist if the IP address is there? Why can’t we use IP address to get content of the website?

Yes! You can use IP addresses to get content of the website but really!.. Would you be able to remember each website’s associated IP address? Obviously not! It’s hard to remember the IP address of every website. That’s why domain names came into the market.

You can relate it to your contact list. You can’t remember every person’s number, but you can remember their name. Same concept applies here as well. You can’t remember those scary IP addresses, but you can easily remember domain names.

This huge amount of data is maintained in a database where the domain name with its IP address is stored.

A system that stores domain names with its corresponding IP address is known as DNS (Domain name system).

DNS lookup to find IP address

DNS is like a phone book and helps us to provide the IP address that is associated with the domain name just like our phone book gives a mobile number which is associated with the person’s name.

https://www.freecodecamp.org/news/content/images/2019/06/dns.png
DNS
This is the overview, but there are four layers through which this domain name query goes through.

More on: What happens when you click on a URL in your browser (freecodecamp.org)

HTTP History

HTTP/0.9

featured-image.png
HTTP-0.9

  • Client send a request

    -> launch a TCP1/IP connect: telnet google.com 80

    -> one-line commend: GET /about

  • CRLF2 to end a request

  • Respond HTML in ASCII3

  • Auto-shut-down every Request / Response (close TCP1/IP connect)

HTTP/1.0

/blog/posts/2022/what-happens-when-you-click-a-link/HTTP-1.0.png
HTTP-1.0

  • Still in ASCII3

    • Multi-line commend is allowed including Header

      Method + Header + CRLF2

    -> launch a TCP1/IP connect: telnet google.com 80

    -> commend:

    1
    2
    
    GET /about HTTP/1.0
    Host: www.google.com
    

    -> additional CRLF2: \n

    /blog/posts/2022/what-happens-when-you-click-a-link/HTTP-1.0-302-Found.png
    HTTP-1.0-302-Found.png
    /blog/posts/2022/what-happens-when-you-click-a-link/HTTP-1.0-Location-Header.png
    HTTP-1.0-Location-Header

  • Response with ASCII3

    1. Status code
    2. Header
    3. Conten type
  • Auto-shut-down every Request / Response (close TCP1/IP connect)

HTTP/1.1

Improved Overall Efficiency

  • Persistent Connection without TCP1/IP connect Auto-shut-down

    Keep the connection alive to be reused for other resources in the same domain (so that you don’t need to re-do all TCP1 hand-shake again and again)

    /blog/posts/2022/what-happens-when-you-click-a-link/HTTP-1.1-Persistent-Connection.png
    HTTP-1.1-Persistent-Connection

  • Chunked encoding transfer: Transfer-Encoding - HTTP | MDN (mozilla.org) encoded-compress for large data

  • Byte range request

    1
    2
    3
    
    GET / HTTP/1.1
    Host: www.miniasp.com
    Range: bytes=0-100
    
    1
    2
    3
    
    GET / HTTP/1.1
    Host: www.miniasp.com
    Range: bytes=101-200
    

    To get the 2 chunked 206 Partial Content

  • Cache control

  • Request pipeline4: multiple requests at a time

    https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/HTTP_pipelining2.svg/1024px-HTTP_pipelining2.svg.png
    Time diagram of non-pipelined vs. pipelined connection

Updates

Summary

The Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed collaborative, hypertext information systems.

  • Stateless
  • Distributed
  • Collaborative
  • Hypertext: not just “text” but with “links”

Resource

HTTP Request Methods

GET

GET is used to request data from a specified resource.

The query string (name/value pairs) is sent in the URL of a GET request: /test/demo_form.php?name1=value1&name2=value2

  • GET requests can be cached
  • GET requests remain in the browser history
  • GET requests can be bookmarked
  • GET requests should never be used when dealing with sensitive data
  • GET requests have length restrictions
  • GET requests are only used to request data (not modify)

POST

POST is used to send data to a server to create/update a resource.

The data sent to the server with POST is stored in the request body of the HTTP request:

1
2
3
4
POST /test/demo_form.php HTTP/1.1
Host: w3schools.com

name1=value1&name2=value2
  • POST requests are never cached
  • POST requests do not remain in the browser history
  • POST requests cannot be bookmarked
  • POST requests have no restrictions on data length

GET vs. POST

  • GET for retrieving sensitive data.
  • POST for sending sensitive data.

The reason to use POST for sending sensitive data is that it prevents data leakage via the query string. One more reason to use POST is that modern browsers don’t appear to cache the results of POST requests by default. However, this should not be relied upon. It is much better to set Cache-control: no-store header in your response either way, any time that sensitive data is output. (More: Should I use GET or POST when requesting sensitive data?)

 GETPOST
BACK button/ReloadHarmlessData will be re-submitted (the browser should alert the user that the data are about to be re-submitted)
BookmarkedCan be bookmarkedCannot be bookmarked
CachedCan be cachedNot cached
Encoding typeapplication/x-www-form-urlencodedapplication/x-www-form-urlencoded or multipart/form-data. Use multipart encoding for binary data
HistoryParameters remain in browser historyParameters are not saved in browser history
Restrictions on data lengthYes, when sending data, the GET method adds the data to the URL; and the length of a URL is limited (maximum URL length is 2048 characters)No restrictions
Restrictions on data typeOnly ASCII characters allowedNo restrictions. Binary data is also allowed
SecurityGET is less secure compared to POST because data sent is part of the URL

Never use GET when sending passwords or other sensitive information!
POST is a little safer than GET because the parameters are not stored in browser history or in web server logs
VisibilityData is visible to everyone in the URLData is not displayed in the URL

PUT

PUT is used to send data to a server to create/update a resource.

The difference between POST and PUT is that PUT requests are idempotent. That is, calling the same PUT request multiple times will always produce the same result. In contrast, calling a POST request repeatedly have side effects of creating the same resource multiple times.

  • What is the difference between POST and PUT in HTTP?
    • POST means “create new” as in “Here is the input for creating a user, create it for me”.

      You POST to example.com/users since you don’t know the URL of the user yet, you want the server to create it.

    • PUT means “insert, replace if already exists” as in “Here is the data for user 5”.

      You PUT to example.com/users/id since you want to replace/create a specific user.

POSTing twice with the same data means create two identical users with different ids. PUTing twice with the same data creates the user the first and updates him to the same state the second time (no changes).

HEAD is almost identical to GET, but without the response body.

In other words, if GET /users returns a list of users, then HEAD /users will make the same request but will not return the list of users.

HEAD requests are useful for checking what a GET request will return before actually making a GET request - like before downloading a large file or response body.

More

HTTP Methods GET vs POST (w3schools.com)


  1. TCP vs UDP: Key Difference Between Them (guru99.com) TCP is a connection-oriented protocol, whereas UDP is a connectionless protocol. The speed for TCP is slower while the speed of UDP is faster. TCP uses handshake protocol like SYN, SYN-ACK, ACK while UDP uses no handshake protocols. ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  2. CRLF Injection | OWASP Foundation The term CRLF refers to Carriage Return (ASCII 13, \r ) Line Feed (ASCII 10, \n ). They’re used to note the termination of a line, however, dealt with differently in today’s popular Operating Systems. ↩︎ ↩︎ ↩︎

  3. ASCII | Definition, History, Trivia, & Facts | Britannica ASCII, in full American Standard Code for Information Interchange, a standard data-encoding format for electronic communication between computers. ASCII assigns standard numeric values to letters, numerals, punctuation marks, and other characters used in computers. ↩︎ ↩︎ ↩︎

  4. HTTP pipelining - Wikipedia HTTP pipelining is a feature of HTTP/1.1 which allows multiple HTTP requests to be sent over a single TCP connection without waiting for the corresponding responses. ↩︎