How to create a Domain Name Validator in Python

The challenge

Create a domain name validator mostly compliant with RFC 1035, RFC 1123, and RFC 2181

The following rules apply:

  • Domain name may contain subdomains (levels), hierarchically separated by . (period) character
  • Domain name must not contain more than 127 levels, including top level (TLD)
  • Domain name must not be longer than 253 characters (RFC specifies 255, but 2 characters are reserved for trailing dot and null character for root level)
  • Level names must be composed out of lowercase and uppercase ASCII letters, digits and – (minus sign) character
  • Level names must not start or end with – (minus sign) character
  • Level names must not be longer than 63 characters
  • Top level (TLD) must not be fully numerical

Additionally:

  • Domain name must contain at least one subdomain (level) apart from TLD
  • Top level validation must be naive – ie. TLDs nonexistent in IANA register are still considered valid as long as they adhere to the rules given above.

The validation function accepts a string with the full domain name and returns a boolean value indicating whether the domain name is valid or not.

Examples:

validate('aoms') == False validate('ao.ms') == True validate('amazon.com') == True validate('AMAZON.COM') == True validate('sub.amazon.com') == True validate('amazon.com-') == False validate('.amazon.com') == False validate('[email protected]') == False validate('127.0.0.1') == False
Code language: Python (python)

The solution in Python

Option 1:

import re def validate(domain): return re.match(''' (?=^.{,253}$) # max. length 253 chars (?!^.+\.\d+$) # TLD is not fully numerical (?=^[^-.].+[^-.]$) # doesn't start/end with '-' or '.' (?!^.+(\.-|-\.).+$) # levels don't start/end with '-' (?:[a-z\d-] # uses only allowed chars {1,63}(\.|$)) # max. level length 63 chars {2,127} # max. 127 levels ''', domain, re.X | re.I)
Code language: Python (python)

Option 2:

def validate(domain): print(domain) if len(domain) > 253 or len(domain) == 0: print(1) return False els = domain.split('.') if len(els) > 127 or len(els) < 2: print(2) return False for x in els: if len(x) > 63 or len(x) == 0: print(3) return False if not x[0].isalnum() or not x[-1].isalnum(): print(4) return False for l in x: if (not all(ord(c) < 128 for c in l) or not l.isalnum()) and l != '-': print(5) return False if els[-1].isnumeric(): return False print(True) return True
Code language: Python (python)

Option 3:

import re def validLevel(lvl): return not bool(re.search(r'^-|-$', lvl)) and bool(re.match(r'[a-zA-Z0-9-]{1,63}$', lvl)) def validate(domain): lst = domain.split('.') return len(domain) <= 253 \ and 2 <= len(lst) <= 127 \ and not lst[-1].isdigit() \ and all( validLevel(lvl) for lvl in lst )
Code language: Python (python)

Test cases to validate our solution

test.describe('Domain name validator tests') test.expect(not validate('aoms')) test.expect(validate('ao.ms')) test.expect(validate('amazon.com')) test.expect(validate('AMAZON.COM')) test.expect(validate('sub.amazon.com')) test.expect(not validate('amazon.com-')) test.expect(not validate('.amazon.com')) test.expect(not validate('[email protected]')) test.expect(not validate('127.0.0.1'))
Code language: Python (python)

Tags:
Subscribe
Notify of
guest
1 Comment
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
Dan Ehrlich
Dan Ehrlich
3 months ago

Thank you for making this available. I am trying to use it to validate a field that can contain either an IP address or a domain name. There is no input validation happening before I get the data and a user input ‘146.186.4 244’ instead of ‘146.186.4.244’. The IP address validation failed, but the regex claims this is a valid FQDN. I am pretty sure spaces are not allowed in a FQDN. Any chance to get an updated version? My regex skills are poor at best.

Thanks,
Dan Ehrlich