URL Canonicalization
socials.fun normalizes URLs to ensure consistent token mapping regardless of how a link is formatted.
Why Canonicalization?
Social media URLs come in many forms:
Same tweet, different URLs:
- https://x.com/user/status/123
- https://twitter.com/user/status/123
- https://x.com/user/status/123?s=20
- https://x.com/user/status/123?s=20&t=abc123
- twitter.com/user/status/123
- x.com/user/status/123/Without canonicalization, each would create a different token. With canonicalization, they all map to:
x.com/user/status/123Canonicalization Rules
1. Protocol Removal
https://x.com/... → x.com/...
http://x.com/... → x.com/...2. WWW Removal
www.youtube.com/... → youtube.com/...3. Domain Normalization
twitter.com/... → x.com/...
youtu.be/abc → youtube.com/watch?v=abc4. Trailing Slash Removal
x.com/user/status/123/ → x.com/user/status/1235. Query Parameter Filtering
Only essential parameters are kept:
| Platform | Kept Parameters |
|---|---|
YouTube /watch | v (video ID only) |
YouTube /playlist | list (playlist ID only) |
| X/Twitter | None (ID is in path) |
| TikTok | None (ID is in path) |
Removed parameters:
- Tracking:
s,t,utm_*,ref,feature - Session:
si,pp,_r,_t - Share metadata:
context,share_id
6. Short URL Expansion
Short URLs are expanded to their canonical form:
vm.tiktok.com/ZMx123/ → tiktok.com/@user/video/456789
t.co/abc123 → (expanded to full URL)
youtu.be/abc123 → youtube.com/watch?v=abc123⚠️
Short URL expansion requires an API call, which may add ~500ms to token creation. The expanded URL is cached for future lookups.
Examples
X/Twitter
| Input | Canonical |
|---|---|
https://x.com/elonmusk/status/123?s=20&t=xyz | x.com/elonmusk/status/123 |
https://twitter.com/elonmusk/status/123 | x.com/elonmusk/status/123 |
twitter.com/elonmusk/status/123/ | x.com/elonmusk/status/123 |
YouTube
| Input | Canonical |
|---|---|
https://www.youtube.com/watch?v=abc&feature=share | youtube.com/watch?v=abc |
https://youtu.be/abc | youtube.com/watch?v=abc |
https://youtube.com/watch?v=abc&t=120 | youtube.com/watch?v=abc |
TikTok
| Input | Canonical |
|---|---|
https://www.tiktok.com/@user/video/123?_r=1 | tiktok.com/@user/video/123 |
https://vm.tiktok.com/ZMx123/ | tiktok.com/@user/video/123 (expanded) |
Technical Implementation
Canonicalization happens in two places:
- Frontend: Instant validation and preview
- Backend: Final canonicalization before token creation
The backend is authoritative - if frontend and backend produce different results, the backend canonical URL is used.
// Simplified canonicalization logic
function canonicalizeUrl(url: string): string {
let canonical = url
.replace(/^https?:\/\//, '') // Remove protocol
.replace(/^www\./, '') // Remove www
.replace(/\/$/, ''); // Remove trailing slash
// Domain normalization
canonical = canonical.replace('twitter.com', 'x.com');
// Strip query params per domain rules
canonical = stripTrackingParams(canonical);
return canonical;
}