PHP
downloads | documentation | faq | getting help | mailing lists | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

preg_quote> <preg_match_all
Last updated: Fri, 16 May 2008

view this page in

preg_match

(PHP 4, PHP 5)

preg_match — Perform a regular expression match

Description

int preg_match ( string $pattern , string $subject [, array &$matches [, int $flags [, int $offset ]]] )

Searches subject for a match to the regular expression given in pattern .

Parameters

pattern

The pattern to search for, as a string.

subject

The input string.

matches

If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.

flags

flags can be the following flag:

PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the return value in an array where every element is an array consisting of the matched string at index 0 and its string offset into subject at index 1.

offset

Normally, the search starts from the beginning of the subject string. The optional parameter offset can be used to specify the alternate place from which to start the search (in bytes).

Note: Using offset is not equivalent to passing substr($subject, $offset) to preg_match() in place of the subject string, because pattern can contain assertions such as ^, $ or (?<=x). Compare:

<?php
$subject 
"abcdef";
$pattern '/^def/';
preg_match($pattern$subject$matchesPREG_OFFSET_CAPTURE3);
print_r($matches);
?>

The above example will output:

Array
(
)

while this example

<?php
$subject 
"abcdef";
$pattern '/^def/';
preg_match($patternsubstr($subject,3), $matchesPREG_OFFSET_CAPTURE);
print_r($matches);
?>

will produce

Array
(
    [0] => Array
        (
            [0] => def
            [1] => 0
        )

)

Return Values

preg_match() returns the number of times pattern matches. That will be either 0 times (no match) or 1 time because preg_match() will stop searching after the first match. preg_match_all() on the contrary will continue until it reaches the end of subject . preg_match() returns FALSE if an error occurred.

ChangeLog

Version Description
4.3.3 The offset parameter was added
4.3.0 The PREG_OFFSET_CAPTURE flag was added
4.3.0 The flags parameter was added

Examples

Example #1 Find the string of text "php"

<?php
// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match("/php/i""PHP is the web scripting language of choice.")) {
    echo 
"A match was found.";
} else {
    echo 
"A match was not found.";
}
?>

Example #2 Find the word "web"

<?php
/* The \b in the pattern indicates a word boundary, so only the distinct
 * word "web" is matched, and not a word partial like "webbing" or "cobweb" */
if (preg_match("/\bweb\b/i""PHP is the web scripting language of choice.")) {
    echo 
"A match was found.";
} else {
    echo 
"A match was not found.";
}

if (
preg_match("/\bweb\b/i""PHP is the website scripting language of choice.")) {
    echo 
"A match was found.";
} else {
    echo 
"A match was not found.";
}
?>

Example #3 Getting the domain name out of a URL

<?php
// get host name from URL
preg_match('@^(?:http://)?([^/]+)@i',
    
"http://www.php.net/index.html"$matches);
$host $matches[1];

// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/'$host$matches);
echo 
"domain name is: {$matches[0]}\n";
?>

The above example will output:

domain name is: php.net

Example #4 Using named subpattern

<?php

$str 
'foobar: 2008';

preg_match('/(?<name>\w+): (?<digit>\d+)/'$str$matches);

print_r($matches);

?>

The above example will output:

Array
(
    [0] => foobar: 2008
    [name] => foobar
    [1] => foobar
    [digit] => 2008
    [2] => 2008
)

Notes

Tip

Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.



preg_quote> <preg_match_all
Last updated: Fri, 16 May 2008
 
add a note add a note User Contributed Notes
preg_match
Norbert
06-May-2008 02:00
Debian way is:
dpkg-reconfigure locales
Georg
04-Apr-2008 11:36
In addition to reiner-keller's comment about Umlaute using setlocale (LC_ALL, 'de_DE');

To enable 'de_DE' on my Debian 4 machine I first had to:
- uncomment 'de_DE' in file /etc/locale.gen and afterwards
- run locale-gen from the shell
alex at akb dot com dot au
18-Mar-2008 11:55
Thought this might be helpful to those people out there writing code for use with Valves steam id's:

<?php

$steam_id
= "STEAM_0:1:1234567890";
$pattern = "/^STEAM_[0-2]:[0-2]:[0-9]{1,10}$/";

if (
preg_match($pattern, $team_id)) {
  echo
"Valid Steam ID";
}
else {
  echo
"Not Valid Steam ID";
}

// output:
// Valid Steam ID

?>
Paperweight
11-Mar-2008 06:38
Beat this modern email address matcher...

/^([a-z0-9]([a-z0-9_-]*\.?[a-z0-9])*)(\+[a-z0-9]+)?
@([a-z0-9]([a-z0-9-]*[a-z0-9])*\.)*
([a-z0-9]([a-z0-9-]*[a-z0-9]+)*)\.[a-z]{2,6}$/

(Remove the line-breaks!)
Its only bug is it thinks single-number top-level domains are ok. Can you find any others?
andy at jmedia web dawt com
07-Mar-2008 05:21
If you want to test for FALSE use === instead.

$result = preg_match("/badtest/J",$string);
if($result === FALSE) {
   // bad query
   error_log("Whoops!");
}  else {
   echo("Matched " . $result . " times");
}
Thijs van Beek
05-Mar-2008 11:56
To the comment below about the vallidation of phone numbers.

PEAR offers some briljant classes for phonenumber vallidation.

Check out http://pear.php.net/packages.php?catpid=50&catname=Validate

Regards
Thijs
joseph at no-spam dot xylex dot net
28-Dec-2007 09:01
A quick example of using named recursion and negative lookaheads for finding the outermost div.  You can use this same idea for any type of nested tags.

<?php
$sample
=
"lead in text to capture <div>
    outside div text
    <div>
        inner div text
        <div>
            deep nested text
        </div>
    </div>
    bottom of outside div text
</div> end of text to capture"
;

preg_match(
'#^(?P<a>.*?)(?P<b>.?<div((.(?!<div))|(?P>b))*?.</div>)(?P<c>.*?)$#s',
$sample, $matches);
echo
"<pre>";

var_dump($matches);
//$matches['a'] == "lead in text to capture"
//$matches['b'] == the outermost <div> and child contents (with a leading space)
//$matches['c'] == " end of text to capture"
?>
ahigerd at timeips dot com
27-Dec-2007 07:19
One note on the regular expressions provided that claim to validate e-mail addresses: They're incomplete. To quote the note submission page, just a couple paragraphs above the box where you type in your note:

"(And if you're posting an example of validating email addresses, please don't bother. Your example is almost certainly wrong for some small subset of cases. See this information from O'Reilly Mastering Regular Expressions book [http://examples.oreilly.com/regex/readme.html] for the gory details.)"

That said, the expressions as provided aren't COMPLETELY irrelevant -- they WILL validate MOST e-mail addresses, and you won't really be blocking any significant portion of the population by using them. Just be aware of the limitations.
bjorn dot padding at gmail dot com
18-Dec-2007 12:52
To test if a regular expression is syntactically correct:

<?
function preg_test($regex)
{
    if (
sprintf("%s",@preg_match($regex,'')) == '')
    {
       
$error = error_get_last();
        throw new
Exception(substr($error['message'],70));
    }
    else
        return
true;
}
?>

usage:

<?
if (preg_test('/.*/i'))
     print
"correct!";

// Returns "correct!"
?>

<?
if (preg_test('/.**/i'))
     print
"correct!";

// Throws exception with message 'Compilation failed: nothing to repeat at offset 2'
?>
boyan7640 at gmail dot com
19-Nov-2007 11:19
If you try to find the offset when searching in UTF-8 string (containing multibyte characters, like cyrillic characters) with preg_match, using the PREG_OFFSET_CAPTURE flag, you may have different result from what you expected.

First of all you must compiled PHP with Multibyte Support (mbstring). Then you must configure to use Multibyte Support functions (mb_*) or turn on some php Runtime Configurations (php.ini, apache vhost conf file, .htaccess or somewhere else):
     php_value           default_charset UTF-8
     php_value           mbstring.func_overload 7
     php_value           mbstring.internal_encoding UTF-8
     php_value           mbstring.detect_order UTF-8

When using preg_match with PREG_OFFSET_CAPTURE flag and UTF-8 string the function will count bytes and NOT characters, so 2 bytes but NOT 1 character for some multibyte character. That's way the offset will be more than what you expected.

My simple solution is using mb_strpos:
     ...
     preg_match($pattern, $found_text, $matches, PREG_OFFSET_CAPTURE);
     // This will convert $matches[0][1] multibyte byte length to multibyte character length (UTF-8)
     $matches[0][1] = mb_strpos($found_text, $matches[0][0]);
     ...

P.S. The $pattern variable must use "/u" switch for Unicode!!!

-------------------------------------------------
PHP Version 5.2.4
Multibyte regex (oniguruma) version 4.4.4
-------------------------------------------------
Who Needs Email at Reg dot Ex
26-Aug-2007 08:17
regex for validating emails, from Perl's RFC2822 package:
 
http://en.wikipedia.org/wiki/Talk:E-mail_address
razortongue
26-Jul-2007 01:47
Maybe it will sound obvious, but I've encountered this a few times...

If you are using preg_match() to validate user input, remember about including ^ and $ to your regex or take input from $matches[0] after successfully matching a pattern ie.
preg_match('/[0-9]+/', '123 UNION SELECT ... --') will return TRUE, but when you it in a SQL statement, injected code will be probably executed(if you don't escape user argument). Note that $matches[0] == '123', so it can be used as a valid input.
David W.
10-Jul-2007 10:33
I just started using PHP and this section doesn't clarify whether or not you must use "/" as your regular expression delimiters.

I want to clarify that you can use almost any character as your delimiter. The delimiter is automatically the first character of your regular expression string. This makes it a bit easier if you are looking for things that might contain a forward slash. For example::

preg_match('#</b>#', $string);

Instead of:

preg_match('/<\/b>/', $string);

Or:

preg_match('@/my/dir/name/@', $string);

Instead of:

preg_match('/\/my\/dir\/name\//', $string);

This can greatly boost readability. Not quite as flexible as in Perl (You can't use control characters or \n which can really come in handy when you aren't quite sure what characters might be in your regular expression), but switching to another delimiter can make your code a bit easier to read.
Izzy
17-Aug-2006 09:27
Concerning the German umlauts (and other language-specific chars as accented letters etc.): If you use unicode (utf-8), you can match them easily with the unicode character property \pL (match any unicode letter) and the "u" modifier, so e.g.

<?php preg_match("/[\w\pL]/u",$var); ?>

would really match all "words" in $var - whether they contain umlauts or not. Took me a while to figure this out, so maybe this comment will safe the day for someone else :-)
patrick at procurios dot nl
29-Jan-2006 07:17
This is the only function in which the assertion \\G can be used in a regular expression. \\G matches only if the current position in 'subject' is the same as specified by the index 'offset'. It is comparable to the ^ assertion, but whereas ^ matches at position 0, \\G matches at position 'offset'.
info at reiner-keller dot de
12-Feb-2005 07:03
Pointing to the post of "internet at sourcelibre dot com": Instead of using PerlRegExp for e.g. german "Umlaute" like

<?php

$bolMatch
= preg_match("/^[a-zA-ZäöüÄÖÜ]+$/", $strData);

?>

use the setlocal command and the POSIX format like

<?php

setlocale
(LC_ALL, 'de_DE');
$bolMatch = preg_match("/^[[:alpha:]]+$/", $strData);

?>

This works for any country related special character set.

Remember since the "Umlaute"-Domains have been released it's almost mandatory to change your RegExp to give those a chance to feed your forms which use "Umlaute"-Domains (e-mail and internet address).

Live can be so easy reading the manual ;-)
hfuecks at phppatterns dot com
13-Jan-2005 02:11
Note that the PREG_OFFSET_CAPTURE flag, as far as I've tested, returns the offset in bytes not characters, which may not be what you're expecting if you're using the /u pattern modifier to make the regex UTF-8 aware (i.e. multibyte characters will result in a greater offset than you expect)
nico at kamensek dot de
17-Jan-2004 08:31
As I did not find any working IPv6 Regexp, I just created one. Here is it:

$pattern1 = '([A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}';
$pattern2 = '[A-Fa-f0-9]{1,4}::([A-Fa-f0-9]{1,4}:){0,5}[A-Fa-f0-9]{1,4}';
$pattern3 = '([A-Fa-f0-9]{1,4}:){2}:([A-Fa-f0-9]{1,4}:){0,4}[A-Fa-f0-9]{1,4}';
$pattern4 = '([A-Fa-f0-9]{1,4}:){3}:([A-Fa-f0-9]{1,4}:){0,3}[A-Fa-f0-9]{1,4}';
$pattern5 = '([A-Fa-f0-9]{1,4}:){4}:([A-Fa-f0-9]{1,4}:){0,2}[A-Fa-f0-9]{1,4}';
$pattern6 = '([A-Fa-f0-9]{1,4}:){5}:([A-Fa-f0-9]{1,4}:){0,1}[A-Fa-f0-9]{1,4}';
$pattern7 = '([A-Fa-f0-9]{1,4}:){6}:[A-Fa-f0-9]{1,4}';

patterns 1 to 7 represent different cases. $full is the complete pattern which should work for all correct IPv6 addresses.

$full = "/^($pattern1)$|^($pattern2)$|^($pattern3)$
|^($pattern4)$|^($pattern5)$|^($pattern6)$|^($pattern7)$/";
thivierr at telus dot net
23-Nov-2003 10:23
A web server log record can be parsed as follows:

$line_in = '209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"';

if (preg_match('!^([^ ]+) ([^ ]+) ([^ ]+) \[([^\]]+)\] "([^ ]+) ([^ ]+) ([^/]+)/([^"]+)" ([^ ]+) ([^ ]+) ([^ ]+) (.+)!',
  $line_in,
  $elements))
{
  print_r($elements);
}

Array
(
    [0] => 209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
    [1] => 209.6.145.47
    [2] => -
    [3] => -
    [4] => 22/Nov/2003:19:02:30 -0500
    [5] => GET
    [6] => /dir/doc.htm
    [7] => HTTP
    [8] => 1.0
    [9] => 200
    [10] => 6776
    [11] => "http://search.yahoo.com/search?p=key+words=UTF-8"
    [12] => "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
)

Notes: 
1) For the referer field ($elements[11]), I intentially capture the double quotes (") and don't use them as delimiters, because sometimes double-quotes do appear in a referer URL.  Double quotes can appear as %22 or \".  Both have to be handled correctly.  So, I strip off the double quotes in a second step.
2) The URLs should be further parsed, using parse_url, which is quicker and more reliable then preg_match.
3) I assume the requested protocol (HTTP/1.1) always has a slash character in the middle, which might not always be the case, but I'll take the risk.
4) The agent field ($elments[12]) is the most unstructured field, so I make no assumptions about it's format.  If the record is truncated, the agent field will not be delimited properly with a quote at the end.  So, both cases must be handled.
5) A hyphen  (- or "-") means a field has no value.  It is necessary to convert these to appropriate value (such as empty string, null, or 0).
6) Finally, there should be appropriate code to handle malformed web log enteries, which are common, due to junk data.  I never assume I've seen all cases.
bjorn at kulturkonsult dot no
01-Apr-2003 03:56
I you want to match all scandinavian characters (æÆøØåÅöÖäÄ) in addition to those matched by \w, you might want to use this regexp:

/^[\w\xe6\xc6\xf8\xd8\xe5\xc5\xf6\xd6\xe4\xc4]+$/

Remember that \w respects the current locale used in PCRE's character tables.

preg_quote> <preg_match_all
Last updated: Fri, 16 May 2008
 
 
show source | credits | sitemap | contact | advertising | mirror sites