Core Concepts
URL Canonicalization

URL Canonicalization

socials.fun normalizes URLs to ensure consistent token mapping regardless of how a link is formatted.

Why Canonicalization?

Social media URLs come in many forms:

Same tweet, different URLs:
- https://x.com/user/status/123
- https://twitter.com/user/status/123
- https://x.com/user/status/123?s=20
- https://x.com/user/status/123?s=20&t=abc123
- twitter.com/user/status/123
- x.com/user/status/123/

Without canonicalization, each would create a different token. With canonicalization, they all map to:

x.com/user/status/123

Canonicalization Rules

1. Protocol Removal

https://x.com/... → x.com/...
http://x.com/...  → x.com/...

2. WWW Removal

www.youtube.com/... → youtube.com/...

3. Domain Normalization

twitter.com/... → x.com/...
youtu.be/abc   → youtube.com/watch?v=abc

4. Trailing Slash Removal

x.com/user/status/123/ → x.com/user/status/123

5. Query Parameter Filtering

Only essential parameters are kept:

PlatformKept Parameters
YouTube /watchv (video ID only)
YouTube /playlistlist (playlist ID only)
X/TwitterNone (ID is in path)
TikTokNone (ID is in path)

Removed parameters:

  • Tracking: s, t, utm_*, ref, feature
  • Session: si, pp, _r, _t
  • Share metadata: context, share_id

6. Short URL Expansion

Short URLs are expanded to their canonical form:

vm.tiktok.com/ZMx123/     → tiktok.com/@user/video/456789
t.co/abc123               → (expanded to full URL)
youtu.be/abc123           → youtube.com/watch?v=abc123
⚠️

Short URL expansion requires an API call, which may add ~500ms to token creation. The expanded URL is cached for future lookups.

Examples

X/Twitter

InputCanonical
https://x.com/elonmusk/status/123?s=20&t=xyzx.com/elonmusk/status/123
https://twitter.com/elonmusk/status/123x.com/elonmusk/status/123
twitter.com/elonmusk/status/123/x.com/elonmusk/status/123

YouTube

InputCanonical
https://www.youtube.com/watch?v=abc&feature=shareyoutube.com/watch?v=abc
https://youtu.be/abcyoutube.com/watch?v=abc
https://youtube.com/watch?v=abc&t=120youtube.com/watch?v=abc

TikTok

InputCanonical
https://www.tiktok.com/@user/video/123?_r=1tiktok.com/@user/video/123
https://vm.tiktok.com/ZMx123/tiktok.com/@user/video/123 (expanded)

Technical Implementation

Canonicalization happens in two places:

  1. Frontend: Instant validation and preview
  2. Backend: Final canonicalization before token creation

The backend is authoritative - if frontend and backend produce different results, the backend canonical URL is used.

// Simplified canonicalization logic
function canonicalizeUrl(url: string): string {
  let canonical = url
    .replace(/^https?:\/\//, '')  // Remove protocol
    .replace(/^www\./, '')         // Remove www
    .replace(/\/$/, '');           // Remove trailing slash
 
  // Domain normalization
  canonical = canonical.replace('twitter.com', 'x.com');
 
  // Strip query params per domain rules
  canonical = stripTrackingParams(canonical);
 
  return canonical;
}