Validating an email address

SMTPRFC

First of all, how is an email address composed: For example this one:

test.email@yopmail.com

There is a “domain part” that is the machine name and domain name that hosts the mail box on the domain you want to write to, here, yopmail.com

Then there is a “local part”, a permanent label allowing the destination mail server to know who to send the mail to on that server – it defines the mail drop, so here, test.email

The “@” symbol means “at”: The mail must be sent to test.email at yopmail.com.

This is the basics of an email address, but a series of rules are then applied to validate these elements to define if a text string is a really valid email address:

  • The address must contain one “@” symbol, not more, not less
  • The complete address cannot be more than 320 characters long, including the “@”
  • The domain part must have at least 2 characters after the “@”
  • The domain part must contain at least one period “.”
  • The domain part must have at least 2 characters after the last period unless the address is composed of an IP address, in which case, it must have 4 numbers betwen 0 and 255 seperated by periods (a dotted quad).
  • If the domain part is an IP address, it must be contained within square brackets ( @[1.2.3.4] ). This is not recommended.
  • The domain part cannot be more than 255 characters long
  • The domain part can only contain the following ASCII characters:
    – ASCII code 48-57 ( “0” to “9”)
    – ASCII code 65 -122 (“A” to “Z” and “a” to “z”)
    –  ASCII code 45 ( – )
    – ASCII code 46 ( . )
    – ASCII code 95 ( _ )
  • The local part cannot be more than 64 characters long
  • The local part can only contain the following ASCII characters – with some exceptions (see below):
    – ASCII code 45 ( – )
    – ASCII code 46 ( . )
    – ASCII code 95 ( _ )
    – ASCII code 48-57 ( “0” to “9”)
    – ASCII code 65 -122 (“A” to “Z” and “a” to “z”
  • Exceptional characters: The following ASCII characters are authorised but are rare and even if the SMTP and POP standards accept them, often service providers will refuse them (for example, Hotmail), and so they are not recommended:
    – ASCII code 33 ( ! )
    – ASCII code 35 ( #)
    – ASCII code 36 ( $ )
    – ASCII code 37 ( % )
    – ASCII code 42 ( * )
    – ASCII code 47 ( / )
    – ASCII code 63 ( ? )
    – ASCII code 94 ( ^)
    – ASCII code 96 ( ` )
    – ASCII code 123 ( { )
    – ASCII code 124 ( | )
    – ASCII code 125 ( } )
    – ASCII code 126 ( ~ )
    – ASCII code 61 ( = )
    – ASCII code 43 ( + )
    – ASCII code 39( ‘ )
    – ASCII code 38 ( & )
  • A local part can be completely contained within double-quotes (“) but this is not common and should be considered as not recommended. It is used to contain normally forbidden characters within a local part of an email address.
  • If you accept local parts between double-quotes, the complete local part must be contained up to the “@” character and you must accept any ASCII character in the 32-127 range – all the 7 bit ASCII character set except the first 32 controle characters.
  • A period cannot start or end a local part
  • 2 periods cannot follow each other (..)

It is then up to you to define if you want to accept or reject an email string if it follows a “not recommended” format, but it still must respect all the rules.

Finally, it is not possible with the current internet messaging standards to check the existence of a local part of an email address on a server. Even though the SMTP protocol does define 2 commands, “EXPN” (get the members of a mailing list) and “VRFY” (check a mail or mailing list address on a server), with the amount of unscrupulous people and the amount of spam and other unwanted emails this could generate, these commands are systematically deactivated on internet facing mail servers!

Leave a Reply

Your email address will not be published. Required fields are marked *