Understanding OAuth 2.0: Architecture, Use Cases, Benefits, and Limitations (Part 3 — PKCE)

Anirban Bhattacherji
7 min readJul 11, 2023

--

During our previous discussion in OAuth 2.0 Part 1, we focused on comprehending the distinctions between SAML and OAuth 2.0, as well as identifying the primary actors involved in the OAuth 2.0 process. In Part 2, we delved into understanding the process flow, various use cases, and limitations of OAuth 2.0. In this particular section, we will explore several additional topics associated with OAuth 2.0.

To begin, we will examine the channels utilized in the OAuth flow. Subsequently, we will delve into the details of the implicit grant flow. Lastly, we will explore the enhanced security provided by PKCE (Proof Key for Code Exchange) in the implicit grant flow to overcome its security issue.

Communication Channels

OAuth 2.0 uses two primary communication channels, each with their specific characteristics and uses.

Front Channel

The front channel is the browser-based part of the OAuth flow. When the user (or resource owner) is redirected to the authorization server, and then back to the client application, this redirection occurs via the front channel. The information passed in the front channel can be seen (and potentially manipulated) by the user or others with access to the user’s browser.

The front channel is used to:

  • Send the user to the authorization server’s login page and authorization prompt
  • Redirect the user back to the client application with the authorization code

The front channel is less secure than the back channel, as it can be exposed to various risks, like the user copying the URL or a malicious extension in the user’s browser intercepting the data. That’s why sensitive data (like access tokens) should not be sent over the front channel. Instead, only an authorization code is sent, which is useless without the client’s credentials that are required to exchange it for an access token.

Back Channel

The back channel refers to direct communication between the client application and the authorization server, outside of the user’s browser. This communication happens via server-to-server requests and is more secure because it can’t be observed or interfered with by the user or third parties.

The back channel is used to:

  • Exchange the authorization code for an access token
  • Refresh an access token using a refresh token
  • Get user info in OpenID Connect

The back channel is more secure and is used for the most sensitive parts of the OAuth 2.0 flow, but it requires the client application to be able to keep secrets, so it can’t be used by client-side JavaScript applications or mobile apps that don’t have a server component.

Overall, these two channels are used together to balance security and usability in the OAuth 2.0 flow.

OAuth 2.0 flow

Remember this diagram from our part 2 discussion? The channels highlighted in green are front channel and the others are back channels.

What if the client cannot store client secret?

Consider a scenario where you have a web browser-based application that functions as a client application but is unable to store the client secret securely. How can we implement the OAuth 2.0 process in such a case? In this particular scenario, the Implicit Grant Flow is utilized, whereby the front channel is employed to retrieve access tokens. Let’s try to understand it.

Implicit Grant Flow

This flow was originally designed for JavaScript-based applications running in the browser, also known as Single-Page Apps (SPAs), which cannot maintain the confidentiality of a client secret due to their public nature.

In this flow, instead of receiving an authorization code that needs to be exchanged for an access token, the client receives the access token directly. After the user authorizes the client, the authorization server redirects the user back to the client with the access token in the URL fragment.

However, it has a number of security issues as below

  1. Exposure of Access Token in URL: In the Implicit flow, the access token is returned directly in the redirect URI, which means it’s exposed in the URL. URLs can be logged in various places (like browser history, HTTP logs, or possibly leaked through the Referer header), leading to potential unauthorized access to the token.
  2. Access Token Sent Over Front Channel: Because the access token is sent over the front channel, it’s susceptible to interception during transmission. For instance, a malicious script running on the user’s browser could potentially steal the access token.
  3. Lack of Token Confidentiality: The Implicit flow does not authenticate the client, meaning it cannot confirm who the access token is being issued to. As a result, the access token could be issued to a malicious application without the authorization server realizing.
  4. No Refresh Tokens: The Implicit flow does not provide refresh tokens, due to the insecure nature of the client. This can limit the lifetime of the client’s access to the resource server, as the client will need to re-authorize once the access token expires.

These issues have led to it being largely deprecated in favor of the Authorization Code flow with Proof Key for Code Exchange (PKCE).

Proof Key for Code Exchange (PKCE)

The Authorization Code flow with PKCE (Proof Key for Code Exchange) is a more secure replacement for the Implicit flow and is designed to work with public clients. It’s a variation of the standard Authorization Code flow, but it doesn’t require a client secret and includes additional protections against authorization code interception attacks.

In this flow, the client creates a unique string called a “code verifier”, generates a hashed version called a “code challenge”, and sends the code challenge and a method parameter (indicating the hash method used) to the authorization server in the initial authorization request. The authorization server includes these in the authorization code that it returns to the client.

When the client requests an access token, it sends the authorization code and the original code verifier. The authorization server verifies that the code verifier matches the code challenge included in the authorization code, and if they match, it returns an access token to the client.

This ensures that even if the authorization code is intercepted, it’s useless without the code verifier. And because the code verifier is never sent over the front channel, it’s much less likely to be intercepted.

Flowchart for Proof Key for Code Exchange (PKCE)

Let’s now try to understand the flow of the Authorization Code flow with PKCE (Proof Key for Code Exchange) step-by-step. Visualizing it as a flowchart would be easier in a graphics tool:

Proof Key for Code Exchange (PKCE) flow chart
  • Create Code Verifier and Code Challenge: The client application generates a cryptographically random string, which is the "code verifier". A "code challenge" is derived from the code verifier by using a hash function like SHA256 and then Base64-URL encoding the result.
  • Authorization Request: The client redirects the user to the authorization server’s authorization endpoint, including parameters such as response_type=code, client_id=<client_id>, redirect_uri=<redirect_uri>, scope=<scope>, code_challenge=<code_challenge>, and code_challenge_method=<method>, where <code_challenge> is the generated code challenge and <method> is the hash function used ("S256" for SHA256). Below is a sample request
GET /authorize?response_type=code
&client_id=CLIENT_ID
&redirect_uri=https%3A%2F%2Fclient%2Eexample%2Ecom%2Fcallback
&scope=read
&state=xyz
&code_challenge=CODE_CHALLENGE
&code_challenge_method=S256
HTTP/1.1
Host: server.example.com
  • User Login and Consent: The user logs in to the authorization server and grants consent to the client application.
  • Return Authorization Code: The authorization server redirects the user back to the client with an authorization code in the URL.
HTTP/1.1 302 Found
Location: https://client.example.com/callback?code=AUTHORIZATION_CODE&state=xyz
  • Exchange Authorization Code for Access Token: The client makes a POST request to the token endpoint of the authorization server, sending the authorization code, the redirect URI, and the original code verifier in the body of the POST request. The grant_type parameter is set to authorization_code.
POST /token HTTP/1.1
Host: server.example.com
Content-Type: application/x-www-form-urlencoded

grant_type=authorization_code
&code=AUTHORIZATION_CODE
&redirect_uri=https%3A%2F%2Fclient%2Eexample%2Ecom%2Fcallback
&client_id=CLIENT_ID
&code_verifier=CODE_VERIFIER
  • Validate Code Verifier: The authorization server creates a code challenge from the received code verifier following the same method used by the client in step 1. If this newly computed code challenge matches the code challenge associated with the received authorization code, the server knows that the client making the request is the same one that initiated the authorization process.
  • Return Access Token: The authorization server returns an access token (and possibly a refresh token) to the client.
HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Cache-Control: no-store
Pragma: no-cache

{
"access_token": "ACCESS_TOKEN",
"token_type": "bearer",
"expires_in": 3600,
"refresh_token": "REFRESH_TOKEN",
"scope": "read"
}
  • Request Resource: The client uses the access token to request the protected resource from the resource server.
  • Return Protected Resource: The resource server validates the access token, and if it’s valid, it returns the requested resource to the client.

In each of these steps, the code verifier never leaves the client, and the code challenge is sent only over secure channels, which significantly improves the security of the OAuth flow. The PKCE enhancement mitigates the threat of authorization code interception attacks.

So, if we draw the UML diagram, the PKCE steps would look like below:

Proof Key for Code Exchange (PKCE) steps

You might be wondering about the “code verifier” and the “code challenge” used in the PKCE flow. To facilitate understanding, let’s explore a straightforward example.

Generating a code challenge for PKCE involves a couple of steps as below:

Generate a Code Verifier: This is a cryptographically random string using characters in the ranges A-Z, a-z, 0–9, “-”, “.”, “_”, and “~”, and between 43 and 128 characters long.

Let’s say our code verifier is:

3TcF-9hEt3T5f5kB2Zpvm_JPxIgjRZT4qW-3mPcG1vw

Generate a Code Challenge: To create the code challenge, you hash the code verifier using the SHA256 hash function and then encode the result using Base64 URL encoding.

In Python, generating a code challenge from the code verifier could look like this:

import base64
import hashlib

code_verifier = '3TcF-9hEt3T5f5kB2Zpvm_JPxIgjRZT4qW-3mPcG1vw'
hashed = hashlib.sha256(code_verifier.encode()).digest()
code_challenge = base64.urlsafe_b64encode(hashed).decode().rstrip("=")

print(code_challenge)

This will print out a string similar to:

YVJFpZN-jRH0wPSjIx4sT6fF-YUuh_WXaZ9Tf7hHk5A

This value is the code challenge which you send to the authorization server in the initial authorization request.

--

--

Anirban Bhattacherji
Anirban Bhattacherji

No responses yet