URL Encode/Decode

URL percent-encoding and decoding

About URL Encoder/Decoder

URL Encoding, also known as Percent-Encoding, is a mechanism for encoding information in Uniform Resource Identifiers (URIs). It is used to replace unsafe ASCII characters with a '%' followed by two hexadecimal digits.

URL encoding ensures URLs can be safely transmitted over the internet without being misinterpreted by web browsers and servers. This is essential knowledge in web development.


Why URL Encoding is Needed

URLs can only contain a limited set of safe characters. When URLs contain unsafe characters, they need to be encoded:

  • Reserved Characters: Characters with special meanings in URLs (like ?, &, =, / etc.) need to be encoded when used as parameter values
  • Unsafe Characters: Characters that may cause ambiguity or security issues (like spaces, <, >, ', ") need to be encoded
  • Non-ASCII Characters: Chinese characters, emojis, and other Unicode characters need to be encoded as UTF-8 byte sequences
  • Special Characters: Certain characters (like #, %, +) have special meanings in different contexts and need encoding

Common Encoded Character Reference Table

  • Space: %20 or + (+ is commonly used to replace spaces in query parameters)
  • / (Slash): %2F
  • ? (Question Mark): %3F
  • & (Ampersand): %26
  • = (Equals): %3D
  • # (Hash): %23
  • % (Percent): %25
  • + (Plus): %2B
  • : (Colon): %3A
  • ; (Semicolon): %3B
  • Chinese Characters: UTF-8 encoded byte sequences, like "中" encoded as %E4%B8%AD

URL Safe Characters

The following characters do not need encoding and can safely appear in URLs:

  • Letters: a-z, A-Z
  • Digits: 0-9
  • Special Characters: - _ . ~

Note: Other characters (including reserved characters) may need encoding in specific contexts.


Encoding Principles

  1. Convert characters to byte sequences (typically using UTF-8 encoding)
  2. Convert each byte to two hexadecimal digits
  3. Add '%' symbol in front
  4. Example: Chinese "中" → UTF-8 bytes [0xE4, 0xB8, 0xAD] → %E4%B8%AD

Practical Use Cases

  • Query Parameters: ?q=hello%20world (space encoded as %20)
  • Form Submission: name=%E5%BC%A0%E4%B8%89 (Chinese "张三" encoded)
  • URL Redirection: ?redirect=https%3A%2F%2Fexample.com (encoding entire URL)
  • Cookie Values: Certain special characters in cookies need encoding
  • File Names: Special characters in filenames need encoding when uploading
  • API Interfaces: Special characters in RESTful API parameters need encoding

URL Encoding in Programming Languages

  • JavaScript: encodeURIComponent() for encoding, decodeURIComponent() for decoding
  • Python: urllib.parse.quote() for encoding, urllib.parse.unquote() for decoding
  • Java: java.net.URLEncoder.encode() for encoding, java.net.URLDecoder.decode() for decoding
  • PHP: urlencode() for encoding, urldecode() for decoding
  • C#: Uri.EscapeDataString() for encoding, Uri.UnescapeDataString() for decoding
  • Go: url.QueryEscape() for encoding, url.QueryUnescape() for decoding

Frequently Asked Questions

Q: Should spaces be encoded as %20 or +?
A: In query parameters (after ? in URL), spaces are commonly encoded as + (historical reasons). But in other parts of URL (like path), spaces should be encoded as %20. When decoding, decodeURIComponent will decode + to space.

Q: Why encode entire URLs?
A: When URL is used as parameter value, need to encode entire URL first to avoid special characters within conflicting with outer URL. For example: ?redirect=https%3A%2F%2Fexample.com, where %3A is colon, %2F is slash.

Q: How are Chinese characters encoded?
A: Chinese characters are first converted to UTF-8 byte sequences, then each byte is percent-encoded. For example: "你好" → UTF-8 [0xE4, 0xBD, 0xA0, 0xE5, 0xA5, 0xBD] → %E4%BD%A0%E5%A5%BD

Q: What's the difference between encodeURI and encodeURIComponent?
A: encodeURI encodes entire URL, preserving special characters in URL (like :, /, ?, etc.); encodeURIComponent encodes URL components (like parameter values), encoding all special characters. Most cases should use encodeURIComponent.

Q: Can URL encoding prevent XSS attacks?
A: URL encoding mainly solves URL transmission problems, cannot directly prevent XSS. But proper use of URL encoding can avoid injection attacks, especially when dynamically constructing URLs. Preventing XSS requires combining HTML encoding, CSP, and other security measures.

Q: How to handle mixed encoded and unencoded situations?
A: Avoid double encoding. If URL is already partially encoded, should decode first then re-encode. This tool automatically handles this situation, but manual operations need attention.


Encoding and Decoding Best Practices

  • When to Encode: When constructing URLs, encode parameter values, path components, etc.
  • When to Decode: When reading URL parameters, decode parameter values
  • Avoid Duplication: Don't re-encode already encoded content
  • Unified Encoding: Frontend and backend use same encoding rules
  • UTF-8 First: Use UTF-8 to encode non-ASCII characters, this is Web standard
  • Validate Input: Validate decoded content to prevent injection attacks

URL Components

Understanding various parts of URL helps with correct use of encoding:

  • Protocol: https:// (usually doesn't need encoding)
  • Domain: www.example.com (usually doesn't need encoding, but international domain names IDN need Punycode encoding)
  • Port: :8080 (usually doesn't need encoding)
  • Path: /path/to/resource (special characters need encoding)
  • Query String: ?key=value&name=张三 (parameter values need encoding)
  • Fragment: #section (special characters need encoding)

Special Characters Explained

  • # (Hash): Marks URL fragment, must be encoded as %23 in parameter values
  • & (Ampersand): Separates query parameters, must be encoded as %26 in parameter values
  • = (Equals): Separates parameter names and values, usually doesn't need encoding in parameter values but can be encoded as %3D for security
  • + (Plus): Often used to represent spaces in query parameters, but + in actual values should be encoded as %2B
  • / (Slash): Separates path components, usually not encoded in path but should be encoded as %2F in parameter values