Using grep to find all emails

How to properly construct regular expression for "grep" linux program, to find all email in, say /etc directory ? Currently, my script is following:

grep -srhw "[[:alnum:]]*@[[:alnum:]]*" /etc

It working OK - a see some of the emails, but when i modify it, to catch the one-or-more charactes before- and after the "@" sign ...

grep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc

.. it stops working at all

Also, it does't catches emails of form "Name.LastName@site.com"

Help !


Here is another example

grep -Eiorh '([[:alnum:]_.-]+@[[:alnum:]_.-]+?.[[:alpha:].]{2,6})' "$@" * | sort | uniq > emails.txt

This variant works with 3 level domains.


grep requires most of the regular expression special characters to be escaped - including + . You'll want to do one of these two:

grep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc

egrep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc

I modified your regex to include punctuation (like .-_ etc) by changing it to

egrep -ho "[[:graph:]]+@[[:graph:]]+"

This still is pretty clean and matches... well, most anything with an @ in it, of course. Also 3rd level domains, also addresses with '%' or '+' in them. See http://www.delorie.com/gnu/docs/grep/grep_8.html for a good documentation on the character class used.

In my example, the addresses were surrounded by white space, making matching quite easy. If you grep through a mail server log for example, you can add < > to make it match only the addresses:

egrep -ho "<[[:graph:]]+@[[:graph:]]+>"

@thomas, @glowcoder and @oedo all are right. The RFC that defines how an eMail address can look is quite a fun read. (I've been using GNU grep 2.9 above, included in Ubuntu).

Also check out zpea's version below, it should make for a less trigger-happy matcher.

链接地址: http://www.djcxy.com/p/92866.html

上一篇: 如何使用HTML5输入验证来验证表单输入

下一篇: 使用grep查找所有电子邮件