downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

htmlentities> <hex2bin
[edit] Last updated: Sat, 07 Jan 2012

view this page in

html_entity_decode

(PHP 4 >= 4.3.0, PHP 5)

html_entity_decodeKonvertiert alle benannten HTML-Zeichen in ihre entsprechenden Ursprungszeichen

Beschreibung

string html_entity_decode ( string $string [, int $quote_style = ENT_COMPAT [, string $charset = 'UTF-8' ]] )

html_entity_decode() ist das Gegenstück zu htmlentities(), das alle benannten HTML-Zeichen innerhalb von string in ihre entsprechenden Ursprungszeichen zurückwandelt.

Parameter-Liste

string

The input string.

quote_style

Der optionale zweite Parameter quote_style lässt Ihnen die Entscheidung, was mit 'einfachen' und "doppelten" Anführungszeichen geschehen soll. Sie können eine der drei genannten Konstanten einsetzen:

Verfügbare quote_style-Konstanten
Konstantenname Beschreibung
ENT_COMPAT Konvertiert doppelte Anführungszeichen und lässt einfache Anführungszeichen unberührt.
ENT_QUOTES Konvertiert sowohl doppelte als auch einfache Anführungszeichen.
ENT_NOQUOTES Lässt sowohl doppelte als auch einfache Anführungszeichen unberührt.

charset

Dieser Parameter gibt den Zeichensatz an, der für die Konvertierung genutzt werden soll. Wenn ein leerer String übergeben wird, wird ein Zeichensatz anhand des aktuellen mbstring Zeichensatzes und der aktuellen Sprache automatisch gewählt.

Die folgenden Zeichensätze werden mit PHP 4.3.0 und höher unterstützt:

Unterstützte Zeichensätze
Zeichensatz Alias Beschreibung
ISO-8859-1 ISO8859-1 Westeuropäisch, Latin-1
ISO-8859-15 ISO8859-15 Westeuropäisch, Latin-9. Enthält das Euro-Zeichen sowie französische und finnische Buchstaben, die in Latin-1(ISO-8859-1) fehlen.
UTF-8   ASCII-kompatibles Multi-Byte 8-Bit Unicode.
cp866 ibm866, 866 DOS-spezifischer Kyrillischer Zeichensatz. Dieser Zeichensatz wird ab PHP Version 4.3.2 unterstützt.
cp1251 Windows-1251, win-1251, 1251 Windows-spezifischer Kyrillischer Zeichensatz. Dieser Zeichensatz wird ab PHP Version 4.3.2 unterstützt.
cp1252 Windows-1252, 1252 Windows spezifischer Zeichensatz für westeuropäische Sprachen.
KOI8-R koi8-ru, koi8r Russisch. Dieser Zeichensatz wird ab PHP Version 4.3.2 unterstützt.
BIG5 950 Traditionelles Chinesisch, hauptsächlich in Taiwan verwendet.
GB2312 936 Vereinfachtes Chinesisch, nationaler Standard-Zeichensatz.
BIG5-HKSCS   Big5 mit Hongkong-spezifischen Erweiterungen; traditionelles Chinesisch.
Shift_JIS SJIS, 932 Japanisch
EUC-JP EUCJP Japanisch

Hinweis: Weitere Zeichensätze sind nicht implementiert, an ihrer Stelle wird ISO-8859-1 verwendet.

Rückgabewerte

Gibt die dekodierte Zeichenkette zurück.

Changelog

Version Beschreibung
5.4.0 Der Standardzeichensatz wurde von ISO-8859-1 auf UTF-8 geändert.
5.0.0 Die Unterstützung für Multibyte-Zeichensätze wurde hinzugefügt.

Beispiele

Beispiel #1 Dekodieren benannter HTML-Zeichen

<?php
$orig 
"I'll \"walk\" the <b>dog</b> now";

$a htmlentities($orig);

$b html_entity_decode($a);

echo 
$a// I'll &quot;walk&quot; the &lt;b&gt;dog&lt;/b&gt; now

echo $b// I'll "walk" the <b>dog</b> now


// Usern mit einer PHP-Version vor 4.3.0 hilft folgender Workaround:
function unhtmlentities($string)
{
    
// replace numeric entities
    
$string preg_replace('~&#x([0-9a-f]+);~ei''chr(hexdec("\\1"))'$string);
    
$string preg_replace('~&#([0-9]+);~e''chr("\\1")'$string);
    
// replace literal entities
    
$trans_tbl get_html_translation_table(HTML_ENTITIES);
    
$trans_tbl array_flip($trans_tbl);
    return 
strtr($string$trans_tbl);
}

$c unhtmlentities($a);

echo 
$c// I'll "walk" the <b>dog</b> now

?>

Anmerkungen

Hinweis:

Sie wundern sich vielleicht, warum trim(html_entity_decode('&nbsp;')); den String nicht zu einem leeren Sting reduziert. Der Grund ist darin zu finden, dass '&nbsp;' nicht dem Zeichen mit ASCII-Code 32 entspricht (dieser wird von trim() entfernt), sondern dem Zeichen mit ASCII-Code 160 (0xa0) in der Standard-Zeichentabelle ISO 8859-1.

Siehe auch



htmlentities> <hex2bin
[edit] Last updated: Sat, 07 Jan 2012
 
add a note add a note User Contributed Notes html_entity_decode
Victor 14-Dec-2011 07:22
We were having very peculiar behavior regarding foreign characters such as e-acute.

However, it was only showing up as a problem when extracting those characters out of our mysql database and when being displayed through a proxy server of ours that handles dns issues.

As other users have made a note of, the default character setting wasn't what they were expecting it to be when they left theirs blank.

When we changed our default_charset to "UTF-8", our problems and needs for using functions like these were no longer necessary in handling foreign characters such as e-acute. Good enough for us!
Martin 26-Jun-2011 06:37
If you need something that converts &#[0-9]+ entities to UTF-8, this is simple and works:

<?php
/* Entity crap. /
$input = "Fovi&#269;";

$output = preg_replace_callback("/(&#[0-9]+;)/", function($m) { return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); }, $input);

/* Plain UTF-8. */
echo $output;
?>
John_Schlick at hotmail dot com 16-Feb-2011 03:39
BE AWARE:  The documentation around the default charset might be wrong.

The changelog says:
5.3.3     Default charset changed from ISO-8859-1 to UTF-8.

Despite the fact that we are running 5.3.3-7 when we do

html_entity_decode("&nbsp;", ENT_QUOTES);
we get "\xa0" the ISO-8859-1 version of a non breaking space.

When we change this to:
html_entity_decode("&nbsp;", ENT_QUOTES, 'UTF-8');
we properly get "\xc2\xa0"

Implying that 'UTF-8' is NOT the default for our installation of php.
neurotic dot neu at gmail dot com 10-Aug-2010 02:25
This is a safe rawurldecode with utf8 detection:

<?php
function utf8_rawurldecode($raw_url_encoded){
   
$enc = rawurldecode($raw_url_encoded);
    if(
utf8_encode(utf8_decode($enc))==$enc){;
        return
rawurldecode($raw_url_encoded);
    }else{
        return
utf8_encode(rawurldecode($raw_url_encoded));
    }
}
?>
Free at Key dot no 01-Jul-2010 07:51
Handy function to convert remaining HTML-entities into human readable chars (for entities which do not exist in target charset):

<?php
function cleanString($in,$offset=null)
{
   
$out = trim($in);
    if (!empty(
$out))
    {
       
$entity_start = strpos($out,'&',$offset);
        if (
$entity_start === false)
        {
           
// ideal
           
return $out;   
        }
        else
        {
           
$entity_end = strpos($out,';',$entity_start);
            if (
$entity_end === false)
            {
                 return
$out;
            }
           
// zu lang um eine entity zu sein
           
else if ($entity_end > $entity_start+7)
            {
                
// und weiter gehts
                
$out = cleanString($out,$entity_start+1);
            }
           
// gottcha!
           
else
            {
                
$clean = substr($out,0,$entity_start);
                
$subst = substr($out,$entity_start+1,1);
                
// &scaron; => "s" / &#353; => "_"
                
$clean .= ($subst != "#") ? $subst : "_";
                
$clean .= substr($out,$entity_end+1);
                
// und weiter gehts
                
$out = cleanString($clean,$entity_start+1);
            }
        }
    }
    return
$out;
}
?>
Matt Robinson 06-Sep-2009 04:11
I wrote in a previous comment that html_entity_decode() only handled about 100 characters. That's not quite true; it only handles entities that exist in the output character set (the third argument). If you want to get ALL HTML entities, make sure you use ENT_QUOTES and set the third argument to 'UTF-8'.

If you don't want a UTF-8 string, you'll need to convert it afterward with something like utf8_decode(), iconv(), or mb_convert_encoding().

If you're producing XML, which doesn't recognise most HTML entities:

When producing a UTF-8 document (the default), then htmlspecialchars(html_entity_decode($string, ENT_QUOTES, 'UTF-8'), ENT_NOQUOTES, 'UTF-8') (because you only need to escape < and > and & unless you're printing inside the XML tags themselves).

Otherwise, either convert all the named entities to numeric ones, or declare the named entities in the document's DTD. The full list of 252 entities can be found in the HTML 4.01 Spec, or you can cut and paste the function from my site (http://inanimatt.com/php-convert-entities.php).
marion at figmentthinking dot com 10-Mar-2009 06:11
I just ran into the:
Bug #27626 html_entity_decode bug - cannot yet handle MBCS in html_entity_decode()!

The simple solution if you're still running PHP 4 is to wrap the html_entity_decode() function with the utf8_decode() function.

<?php
$string
= '&nbsp;';
$utf8_encode = utf8_encode(html_entity_decode($string));
?>

By default html_entity_decode() returns the ISO-8859-1 character set, and by default utf8_decode()...

http://us.php.net/manual/en/function.utf8-decode.php
"Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1"
jl dot garcia at gmail dot com 04-Mar-2009 04:33
I created this function to filter all the text that goes in or comes out of the database.

<?php
function filter_string($string, $nohtml='', $save='') {
    if(!empty(
$nohtml)) {
       
$string = trim($string);
        if(!empty(
$save)) $string = htmlentities(trim($string), ENT_QUOTES, 'ISO-8859-15');
        else
$string = html_entity_decode($string, ENT_QUOTES, 'ISO-8859-15');
    }
    if(!empty(
$save)) $string = mysql_real_escape_string($string);
    else
$string = stripslashes($string);
    return(
$string);
}
?>
Anonymous 31-Jul-2008 12:01
You may want to specify the character set if you see unexpected behavior.  Here is an example.

# cat test.php
<?php
$str
= '&#33;';
$quotes = html_entity_decode($str, ENT_QUOTES);
$noquotes = html_entity_decode($str, ENT_NOQUOTES);
$noquotesutf8 = html_entity_decode($str, ENT_NOQUOTES, 'UTF-8');
echo
"quotes='$quotes', noquotes='$noquotes', noquotesutf8='$noquotesutf8'\n";
?>

# php test.php
quotes='!', noquotes='&#33;', noquotesutf8='!'
kae at verens dot com 09-May-2008 08:11
the references to 'chr()' in the example unhtmlentities() function should be changed to unichr, using the example unichr() function described in the 'chr' reference (http://php.net/chr).

the reason for this is characters such as &#x20AC; which do not break down into an ASCII number (that's the Euro, by the way).
me at richardsnazell dot com 21-Jan-2008 05:19
I had a problem getting the 'TM' trademark symbol to display correctly in an email subject line. Using html_entity_decode() with different charsets didn't work, but directly replacing the entity with it's ASCII equivalent did:

$subject = str_replace('&trade;', chr(153), $subject);
jojo 03-Nov-2006 09:27
The decipherment does the character encoded by the escape function of JavaScript.
When the multi byte is used on the page, it is effective.

javascript escape('aaああaa') ..... 'aa%u3042%u3042aa'
php  jsEscape_decode('aa%u3042%u3042aa')..'aaああaa'

<?php
function jsEscape_decode($jsEscaped,$outCharCode='SJIS'){
   
$arrMojis = explode("%u",$jsEscaped);
    for (
$i = 1;$i < count($arrMojis);$i++){
       
$c = substr($arrMojis[$i],0,4);
       
$cc = mb_convert_encoding(pack('H*',$c),$outCharCode,'UTF-16');
       
$arrMojis[$i] = substr_replace($arrMojis[$i],$cc,0,4);
    }
    return
implode('',$arrMojis);
}
?>
romekt at CUTTHISgmail dot com 01-Sep-2006 04:15
here's a simple workaround for the UTF-8 support problem

<?php
$var
=iconv("UTF-8","ISO-8859-1",$var);
$var=html_entity_decode($var, ENT_QUOTES, 'ISO-8859-1');
$var=iconv("ISO-8859-1","UTF-8",$var);
?>
grvg (at) free (dot) fr 29-Jul-2006 11:44
Here is the ultimate functions to convert HTML entities to UTF-8 :
The main function is htmlentities2utf8
Others are helper functions

<?php
function chr_utf8($code)
    {
        if (
$code < 0) return false;
        elseif (
$code < 128) return chr($code);
        elseif (
$code < 160) // Remove Windows Illegals Cars
       
{
            if (
$code==128) $code=8364;
            elseif (
$code==129) $code=160; // not affected
           
elseif ($code==130) $code=8218;
            elseif (
$code==131) $code=402;
            elseif (
$code==132) $code=8222;
            elseif (
$code==133) $code=8230;
            elseif (
$code==134) $code=8224;
            elseif (
$code==135) $code=8225;
            elseif (
$code==136) $code=710;
            elseif (
$code==137) $code=8240;
            elseif (
$code==138) $code=352;
            elseif (
$code==139) $code=8249;
            elseif (
$code==140) $code=338;
            elseif (
$code==141) $code=160; // not affected
           
elseif ($code==142) $code=381;
            elseif (
$code==143) $code=160; // not affected
           
elseif ($code==144) $code=160; // not affected
           
elseif ($code==145) $code=8216;
            elseif (
$code==146) $code=8217;
            elseif (
$code==147) $code=8220;
            elseif (
$code==148) $code=8221;
            elseif (
$code==149) $code=8226;
            elseif (
$code==150) $code=8211;
            elseif (
$code==151) $code=8212;
            elseif (
$code==152) $code=732;
            elseif (
$code==153) $code=8482;
            elseif (
$code==154) $code=353;
            elseif (
$code==155) $code=8250;
            elseif (
$code==156) $code=339;
            elseif (
$code==157) $code=160; // not affected
           
elseif ($code==158) $code=382;
            elseif (
$code==159) $code=376;
        }
        if (
$code < 2048) return chr(192 | ($code >> 6)) . chr(128 | ($code & 63));
        elseif (
$code < 65536) return chr(224 | ($code >> 12)) . chr(128 | (($code >> 6) & 63)) . chr(128 | ($code & 63));
        else return
chr(240 | ($code >> 18)) . chr(128 | (($code >> 12) & 63)) . chr(128 | (($code >> 6) & 63)) . chr(128 | ($code & 63));
    }

   
// Callback for preg_replace_callback('~&(#(x?))?([^;]+);~', 'html_entity_replace', $str);
   
function html_entity_replace($matches)
    {
        if (
$matches[2])
        {
            return
chr_utf8(hexdec($matches[3]));
        } elseif (
$matches[1])
        {
            return
chr_utf8($matches[3]);
        }
        switch (
$matches[3])
        {
            case
"nbsp": return chr_utf8(160);
            case
"iexcl": return chr_utf8(161);
            case
"cent": return chr_utf8(162);
            case
"pound": return chr_utf8(163);
            case
"curren": return chr_utf8(164);
            case
"yen": return chr_utf8(165);
           
//... etc with all named HTML entities
       
}
        return
false;
    }
   
    function
htmlentities2utf8 ($string) // because of the html_entity_decode() bug with UTF-8
   
{
       
$string = preg_replace_callback('~&(#(x?))?([^;]+);~', 'html_entity_replace', $string);
        return
$string;
    }
?>
loufoque 08-Oct-2005 03:15
If you want to decode NCRs to utf-8 use this function instead of chr().

<?php
function utf8_chr($code)
{
    if(
$code<128) return chr($code);
    else if(
$code<2048) return chr(($code>>6)+192).chr(($code&63)+128);
    else if(
$code<65536) return chr(($code>>12)+224).chr((($code>>6)&63)+128).chr(($code&63)+128);
    else if(
$code<2097152) return chr($code>>18+240).chr((($code>>12)&63)+128)
                                  .
chr(($code>>6)&63+128).chr($code&63+128));
}
?>
florianborn (at) yahoo (dot) de 20-Jul-2005 05:43
Note that

<?php

 
echo urlencode(html_entity_decode("&nbsp;"));

?>

will output "%A0" instead of "+".
gaui at gaui dot is 04-Jul-2005 07:15
[If you are missing the html_entity_decode() function in your version of PHP, you may wish to try this code snippet.]

<?php
if( !function_exists( 'html_entity_decode' ) )
{
    function
html_entity_decode( $given_html, $quote_style = ENT_QUOTES ) {
       
$trans_table = array_flip(get_html_translation_table( HTML_SPECIALCHARS, $quote_style ));
       
$trans_table['&#39;'] = "'";
        return (
strtr( $given_html, $trans_table ) );
       }
}
?>
marius (at) hot (dot) ee 08-Apr-2005 08:40
To convert html entities into unicode characters, use the following:

<?php
        $trans_tbl
= get_html_translation_table(HTML_ENTITIES);
        foreach(
$trans_tbl as $k => $v)
        {
           
$ttr[$v] = utf8_encode($k);
        }
   
       
$text = strtr($text, $ttr);
?>
php dot net at c dash ovidiu dot tk 18-Mar-2005 01:37
Quick & dirty code that translates numeric entities to UTF-8.

<?php

   
function replace_num_entity($ord)
    {
       
$ord = $ord[1];
        if (
preg_match('/^x([0-9a-f]+)$/i', $ord, $match))
        {
           
$ord = hexdec($match[1]);
        }
        else
        {
           
$ord = intval($ord);
        }
       
       
$no_bytes = 0;
       
$byte = array();

        if (
$ord < 128)
        {
            return
chr($ord);
        }
        elseif (
$ord < 2048)
        {
           
$no_bytes = 2;
        }
        elseif (
$ord < 65536)
        {
           
$no_bytes = 3;
        }
        elseif (
$ord < 1114112)
        {
           
$no_bytes = 4;
        }
        else
        {
            return;
        }

        switch(
$no_bytes)
        {
            case
2:
            {
               
$prefix = array(31, 192);
                break;
            }
            case
3:
            {
               
$prefix = array(15, 224);
                break;
            }
            case
4:
            {
               
$prefix = array(7, 240);
            }
        }

        for (
$i = 0; $i < $no_bytes; $i++)
        {
           
$byte[$no_bytes - $i - 1] = (($ord & (63 * pow(2, 6 * $i))) / pow(2, 6 * $i)) & 63 | 128;
        }

       
$byte[0] = ($byte[0] & $prefix[0]) | $prefix[1];

       
$ret = '';
        for (
$i = 0; $i < $no_bytes; $i++)
        {
           
$ret .= chr($byte[$i]);
        }

        return
$ret;
    }

   
$test = 'This is a &#269;&#x5d0; test&#39;';

    echo
$test . "<br />\n";
    echo
preg_replace_callback('/&#([0-9a-fx]+);/mi', 'replace_num_entity', $test);

?>
Silvan 28-Jan-2005 08:33
Passing NULL or FALSE as a string will generate a '500 Internal Server Error' (or break the script when inside a function).

So always test your string first before passing it to html_entity_decode().
daniel at brightbyte dot de 13-Nov-2004 07:12
This function seems to have to have two limitations (at least in PHP 4.3.8):

a) it does not work with multibyte character codings, such as UTF-8
b) it does not decode numeric entity references

a) can be solved by using iconv to convert to ISO-8859-1, then decoding the entities, than convert to UTF-8 again. But that's quite ugly and detroys all characters not present in Latin-1.

b) can be solved rather nicely using the following code:

<?php
function decode_entities($text) {
   
$text= html_entity_decode($text,ENT_QUOTES,"ISO-8859-1"); #NOTE: UTF-8 does not work!
   
$text= preg_replace('/&#(\d+);/me',"chr(\\1)",$text); #decimal notation
   
$text= preg_replace('/&#x([a-f0-9]+);/mei',"chr(0x\\1)",$text);  #hex notation
   
return $text;
}
?>

HTH
aidan at php dot net 14-Sep-2004 02:57
This functionality is now implemented in the PEAR package PHP_Compat.

More information about using this function without upgrading your version of PHP can be found on the below link:

http://pear.php.net/package/PHP_Compat

 
show source | credits | sitemap | contact | advertising | mirror sites