Non greedy (reluctant) regex matching in sed?
I'm trying to use sed to clean up lines of URLs to extract just the domain..
So from:
http://www.suepearson.co.uk/product/174/71/3816/
I want:
http://www.suepearson.co.uk/
(either with or without the trainling slash, it doesn't matter)
I have tried:
sed 's|(http://.*?/).*|1|'
and (escaping the non greedy quantifier)
sed 's|(http://.*?/).*|1|'
but I can not seem to get the non greedy quantifier to work, so it always ends up matching the whole string.
Neither basic nor extended Posix/GNU regex recognizes the non-greedy quantifier; you need a later regex. Fortunately, Perl regex for this context is pretty easy to get:
perl -pe 's|(http://.*?/).*|1|'
Try [^/]* instead of .*? :
sed 's|(http://[^/]*/).*|1|g'
With sed, I usually implement non-greedy search by searching for anything except the separator until the separator :
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;(http://[^/]*)/.*;1;p'
Output:
http://www.suon.co.uk
this is:
-n s/<pattern>/<replace>/p ; search command separator instead of / to make it easier to type so s;<pattern>;<replace>;p ( ... ) , later accessible with 1 , 2 ... http:// [] , [ab/] would mean either a or b or / ^ in [] means not , so followed by anything but the thing in the [] [^/] means anything except / character * is to repeat previous group so [^/]* means characters except / . sed -n 's;(http://[^/]*) means search and remember http:// followed by any characters except / and remember what you've found / so add another / at the end: sed -n 's;(http://[^/]*)/' but we want to match the rest of the line after the domain so add .* 1 ) is the domain so replace matched line with stuff saved in group 1 and print: sed -n 's;(http://[^/]*)/.*;1;p' If you want to include backslash after the domain as well, then add one more backslash in the group to remember:
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;(http://[^/]*/).*;1;p'
output:
http://www.suon.co.uk/
链接地址: http://www.djcxy.com/p/47124.html
上一篇: 如何使用sed替换换行符(\ n)?
