How Web Browsers Works (in Plain English)

The web browser is inarguably the most common portal for users to access the World Wide Web content. As a client/server model, the web browser is the client run on a computer (or any other Internet-enabled device that supports a browser) that sends requests for information to the webserver. In response to the request, the webserver sends data back to the web browser, which then displays the results to the user. 

Today’s browsers are fully-functional software suites, from HTML Web pages to applications and JavaScript to AJAX; they can interpret and display all sorts of content hosted on the webservers. Most (if not all) of the browsers, offer plug-ins that further extend their capabilities. 

Following are the major browsers:

  • Internet Explorer 
  • Firefox
  • Google Chrome
  • Safari
  • Opera

The functionality of the Web Browser:

I think you would have understood this one thing about browsers by now; they are all about fetching, processing, displaying, and storing.

Structure of Web Browser:

  • User Interface
  • Browser Engine
  • Render Engine
  • Storage
  • UI Backend
  • JavaScript Interpreter
  • Networking

Let’s explore each item in detail.

User Interface:

Interaction between user and browser occurs in this space via the provided controls. Every other part, except the window where the requested web page is displayed, comes under it. If you cast a look at all the major browsers, you’ll find that they vary quite a lot from each other in terms of look and feel. The reason behind it is that there are no specific standards out there on how web browsers should look and feel. The HTML5 specification doesn’t define User Interface elements but lists some standard features. Here is a list of them:

  • location bar
  • personal bar
  • status bar
  • toolbar

Browser Engine:

You can think of the browser engine as a bridge between the UI and the underlying rendering engine. Based on your interaction with the UI controls, the browser engine queries and manipulates the rendering engine. It provides a method to initiate loading the Uniform Resource Locator (or, to put it merely, URL), takes care of reloading, back, and forward browsing action.

Rendering Engine:

As quickly evident from its name, the third item on the list is responsible for rendering the content of the requested web page on the screen. Rendering engine by default displays HTML, XML, and images. However, by using plug-ins or extensions, you can make it display other types of data too. 

As mentioned above, different browsers differ a lot from each other when it comes to look and feel, but that’s not the only area where they differ. Different browsers also differ in terms of the rendering engine they use. Let’s take a look at what rendering engines the major browsers use:

  • Internet Explorer: Trident
  • Firefox & other Mozilla browsers: Gecko
  • Chrome & Opera 15+: Blink
  • Chrome (iPhone) & Safari: Webkit

Networking:

This component handles all sorts of network communication within the browser. It uses HTTP, FTP, and other communication protocols while fetching the resources from requested URLs. 

For the resolving of the requested URLs, the browser relies on DNS. The records are cached in the browser, operating system, etc. If the requested URL is not present in the cache, a DNS query is then initiated by the internet service provider (ISP) DNS server to find that server’s IP. The browser establishes a connection with the server with protocols, once it receives the correct IP address. A SYN (short for synchronize) packet is sent to the server by the browser, asking if it is open for TCP connection. 

The server responds with ACK (short for acknowledgment) of the SYN packet using the SYN/ACK packet. Once the browser receives it, it will acknowledge by sending an ACK packet. Once all this is done, a TCP connection is established between the browser and the server for data communication. Data transfer through this connection is only possible, if the connection meets the requirements of HTTP Protocol including connection, messaging, request, and response rules.

JavaScript Interpreter: 

One of the World Wide Web’s core technologies, JavaScript, enables interactive web pages and all major web browsers have a dedicated JavaScript engine to execute it. 

The script tag is always put at the end of the HTML file, primarily because the browser immediately pauses the DOM tree construction after it encounters the script tag and keeps it on hold until the script is fully executed. JavaScript Interpreter (as quickly evident from its name) interprets the JS code embedded in a website and then sends the analyzed results to the rendering engine for display. As mentioned above, all major web browsers have a dedicated JavaScript engine. Let’s take a look at what JavaScript engine each of them uses:

  • Chrome: V8 Engine (Node JS was built on top of this)
  • Mozilla: Spider Monkey (formerly known as ‘Squirrel Fish’)
  • Microsoft Edge: Chakra
  • Safari: Nitro

UI Backend: 

Combo boxes, windows, and other such basic widgets are drawn using UI Backend. It underneath uses OS UI methods. It exposes a generic platform that is not platform-specific.

Data Storage:

This layer is persistent, which helps the browser to store data (such as localStorage, cookies, IndexedDB, FileSystem, WebSQL, preferences, and so forth). The new HTML5 specification describes a database that is complete in a browser.

The web content is displayed through a series of the process, let me explain it for you:

I hope you have understood the working of the networking layer by now. Once the networking layer starts receiving the contents of the requested documents, it will start sending it to the rendering engine in chunks of 8KBs, who then parses them and converts the elements within these chunks of HTML document to DOM nodes in a tree called the “DOM tree”. The external CSS files, along with in style elements are also parsed by it. 

While all this is happening, another tree called the render tree is being constructed by the browser. The order in which the visual elements will be displayed is defined in this tree. It is the visual representation of the document. Now the reason why this tree is required is to enable painting the contents in their correct order.

Once the render tree is being done constructing, the process then proceeds to the next level called layout. The exact size and position of each of the content should be calculated to render on a page. The process of calculating the position and size is called layout or reflow. This means giving exact coordinates to each node where they should appear on the display. 0,0 is the root renderer’s position, while the dimensions of the root renderer are the viewport (the visible part of the browser window). 

The next stage is the painting. In this stage, each of the renderers is traversed, and the renderer’s “paint()” method is called to display content on the screen. The painting uses the UI backend layer.

The rendering engine does not wait for the HTML parsing to be finished before building and laying out the render tree. It always tries to display the contents on the screen as soon as possible for better UX. It parses and displays the content it has received from the networking layer while the rest of the contents stills keep pouring in from the network.

That’s all, folks!