ruby - Understanding negative look aheads in regular expressions -
i want match urls not contain string 'localhost' using ruby regex
based on answers , comments here, put 2 solutions, both of seem work:
solution a:
(?!.*localhost)^.*$
example: http://rubular.com/r/tqtbwacl3g
solution b:
^((?!localhost).)*$
example: http://rubular.com/r/2kknqzumwf
the problem don't understand they're doing. example, according docs, ^
can used in various ways:
[^abc] single character except: a, b, or c ^ start of line
but don't how it's being applied here.
can breakdown these expressions me, , how differ 1 another?
in both of cases, ^
start of line (since it's not used inside character class). since both ^
, lookahead zero-width assertions, can switch them around in first case - think makes bit easier explain:
^(?!.*localhost).*$
the ^
anchors expression beginning of string. lookahead starts position , tries find localhost
anywhere string (the "anywhere" taken care of .*
in front of localhost
). if localhost
can found, subexpression of lookahead matches , therefore negative lookahead causes pattern fail. since lookahead bound start @ beginning of string adjacent ^
means, pattern overall cannot match. if, .*localhost
not match (and hence localhost
not occur in string), lookahead succeeds, , .*$
takes care of matching rest of string.
now other one
^((?!localhost).)*$
this time lookahead checks @ current position (there no .*
inside it). but lookahead repeated every single character. way check every single position again. here happens: ^
makes sure we're starting @ beginning of string again. lookahead checks whether word localhost
found @ position. if not, well, , .
consumes 1 character. *
repeats both of steps. 1 character further in string, , lookahead checks whether second character starts word localhost
- again, if not, well, , .
consumes character. done every single character in string, until reach end.
in particular case both methods equivalent, , select 1 based on performance (if matters) or readability (if not; first one). however, in other cases second variant preferable, because allows repetition fixed part of string, whereas first variant check entire string.