Regular expression fun with emails; top level domain not required when it should be

I'm trying to create a regular expressions that will filter valid emails using PHP and have ran into an issue that conflicts with what I understand of regular expressions. Here is the code that I am using.

if (!preg_match('/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+.[a-zA-Z]{2,4}$/', $string)) {
return $false;
}

Now from the materials that I've researched, this should allow content before the @ to be multiple letters, numbers, underscores and periods, then afterwards to allow multiple letters and numbers, then require a period, then two to four letters for the top level domain.

However, right now it ignores the requirement for having the top level domain section. For example a@b.c obviously is valid (and should be), but a@b is also returning as valid, which I want ti to be flagged as not so.

I'm sure I"m missing something, but after browsing google for an hour I'm at a loss as to what it could be. Anyone have an answer for this conundrum?

EDIT: The speed that answers arrive here makes this site superior over it's competitors. Well done!

13.10.2009 18:49:03
Your regular expression does not match a@b.c.d.
Greg Hewgill 13.10.2009 18:54:50
Is it supposed to match any email address, meaning just check if it's a valid one? Check out PHP's own filter_var method using the FILTER_VALIDATE_EMAIL constant. Might do the trick just fine..
Jörg 13.10.2009 18:58:27
Ya I think I might just use it. This isn't behaving as I've been told through multiple sources.
canadiancreed 13.10.2009 19:10:36
6 ОТВЕТОВ
РЕШЕНИЕ

You should escape . when it's not a part of the group: '/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/' Otherwise it will be equal to any letter:

  • . - any symbol (but not the newline \n if not using s modifier)
  • \. - dot symbol
  • [.] - dot symbol (inside symbol group)
5
13.10.2009 19:00:22
Instead of \., I find [.] to be more readable. It puts the . character into its own group.
Thomas Owens 13.10.2009 18:55:41
Agreed. Although it didnt' make a difference. Both \. and [.] still say that the email passed is valid.
canadiancreed 13.10.2009 19:01:38
I've just executed var_dump(preg_match('/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/', 'a@basd')); and it prints int(0) which is false
Ivan Nevostruev 13.10.2009 19:15:24
Yep I found the mistake on my end. My apologies for the erroneous reply earlier.
canadiancreed 13.10.2009 19:30:18
john@some.domain.co.uk or john@example.co.uk won't validate with that regular expression.
Mauricio 23.10.2009 22:50:02

Rather than rolling your own, perhaps you should read the article How to Find or Validate an Email Address on Regular-Expressions.info. The article also discusses reasons why you might not want to validate an email address using a regular expression and provides 3 regular expressions that you might consider using instead of your own.

5
13.10.2009 18:53:29

A single dot in a regular expression means "match any character". And that's exactly what is does when a top level domain is missing (also when it's present, of course).

Thus you should change your code like that:

if (!preg_match('/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/', $string)) {
    return $false;
}

And by the way: a lot more characters are allowed in the local part than what your regular expression currently allows for.

0
13.10.2009 18:56:10
Agreed on your link. I figured though that I should get this working before I start to get more involved and get in way over my head. Also tried your code. Same result, it does not require a dot and validates without it.
canadiancreed 13.10.2009 19:08:12
2
13.10.2009 19:06:49
Will this work for PHP? I ask as it looks to be a Perl module?
canadiancreed 13.10.2009 19:09:11
The Perl module just gives you an easy way of running things through that regular expression.
ceejayoz 13.10.2009 20:37:35

This is the most reasonable trade off of the spec versus real life that I have seen:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+
(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
@
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+
(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b

Of course, you have to remove the line breaks, and you have to update it if more top-level domains become available.

1
13.10.2009 19:15:34

From the page Comparing E-mail Address Validating Regular Expressions: Geert De Deckere from the Kohana project has developed a near perfect one:

/^[-_a-z0-9\'+*$^&%=~!?{}]++(?:\.[-_a-z0-9\'+*$^&%=~!?{}]+)*+@(?:(?![-.])[-a-z0-9.]+(?<![-.])\.[a-z]{2,6}|\d{1,3}(?:\.\d{1,3}){3})(?::\d++)?$/iD

But there is also a buildin function in PHP filter_var($email, FILTER_VALIDATE_EMAIL) but it seems to be under development. And there is an other serious solution: PEAR:Validate. I think the PEAR Solution is the best one.

3
13.10.2009 19:30:53
I've ran into some limitations of the filter_Var one (unlimited top domain sizes for one) so I'll give the PEAR one a shot. Thanks!
canadiancreed 13.10.2009 20:33:25
what are "unlimited top domain sizes"? It has come to my understanding that a tld can be up to 5 characters (.museum) and a domain can be up to 63 characters.
ty812 27.10.2009 00:01:55