A proper (logical) alternative for unicode strings;
<?php
function substr_unicode($str, $s, $l = null) {
return join("", array_slice(
preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}
$str = "Büyük";
$s = 0; // start from "0" (nth) char
$l = 3; // get "3" chars
echo substr($str, $s, $l) ."\n"; // Bü
echo mb_substr($str, $s, $l) ."\n"; // Bü
echo substr_unicode($str, $s, $l); // Büy
?>
mb_substr
(PHP 4 >= 4.0.6, PHP 5)
mb_substr — Возвращает часть строки
Описание
string mb_substr
( string
$str
, int $start
[, int $length
[, string $encoding
]] )
Корректно выполняет substr() для многобайтовых кодировок,
учитывая количество символов. Позиция отсчитывается от начала
str. Позиция первого символа - 0, второго - 1 и т.д.
Список параметров
-
str -
Исходная строка (string) для получения подстроки.
-
start -
Позиция символа
str, с которой выделяется подстрока. -
length -
Максимальное количество символов возвращаемой подстроки (string)
-
encoding -
Параметр
encodingпредставляет собой символьную кодировку. Если он опущен, вместо него будет использовано значение внутренней кодировки.
Возвращаемые значения
mb_substr() Возвращает не более length
символов, начиная с позиции start исходной строки.
str
Смотрите также
- mb_strcut() - Получение части строки
- mb_internal_encoding() - Установка/получение внутренней кодировки скрипта
qeremy [atta] gmail [dotta] com
27-Feb-2012 04:58
p dot assenov at aip-solutions dot com
02-Dec-2011 10:17
I'm trying to capitalize only the first character of the string and tried some of the examples above but they didn't work. It seems mb_substr() cannot calculate the length of the string in multi-byte encoding (UTF-8) and it should be set explicitly. Here is the corrected version:
<?php
function mb_ucfirst($str, $enc = 'utf-8') {
return mb_strtoupper(mb_substr($str, 0, 1, $enc), $enc).mb_substr($str, 1, mb_strlen($str, $enc), $enc);
}
?>
cheers!
levani9191 at gmail dot com
18-Jul-2010 05:37
A simple code that check if the latest symbol in the string is a question mark and adds one if it doesn't...
<?php $string = (mb_substr($string, -1, 1, 'UTF-8') != '?') ? $string.'?' : $string; ?>
Anonymous
26-Feb-2010 06:15
If start is negative, the returned string will start at the start'th character from the end of string
dziamid at gmail dot com
06-Feb-2009 09:27
Here is my solution to highlighting search queries in multibyte text:
<?php
function mb_highlight($data, $query, $ins_before, $ins_after)
{
$result = '';
while (($poz = mb_strpos(mb_strtolower($data), mb_strtolower($query))) !== false)
{
$query_len = mb_strlen ($query);
$result .= mb_substr ($data, 0, $poz).
$ins_before.
mb_substr ($data, $poz, $query_len).
$ins_after;
$data = mb_substr ($data, $poz+$query_len);
}
return $result;
}
?>
Enjoy!
[EDIT BY danbrown AT php DOT net: Reclassified to a more appropriate function manual page.]
projektas at gmail dot com
21-Oct-2008 08:29
First letter in upper case <hr />
<?php
header ('Content-type: text/html; charset=utf-8');
if (isset($_POST['check']) && !empty($_POST['check'])) {
echo htmlspecialchars(ucfirst_utf8($_POST['check']));
} else {
echo htmlspecialchars(ucfirst_utf8('Žąsinų'));
}
function ucfirst_utf8($str) {
if (mb_check_encoding($str,'UTF-8')) {
$first = mb_substr(
mb_strtoupper($str, "utf-8"),0,1,'utf-8'
);
return $first.mb_substr(
mb_strtolower($str,"utf-8"),1,mb_strlen($str),'utf-8'
);
} else {
return $str;
}
}
?>
<form method="post" action="" >
<input type="input" name="check" />
<input type="submit" />
</form>
Silvan
01-Sep-2007 05:30
Passing null as length will not make mb_substr use it's default, instead it will interpret it as 0.
<?php
mb_substr($str,$start,null,$encoding); //Returns '' (empty string) just like substr()
?>
Instead use:
<?php
mb_substr($str,$start,mb_strlen($str),$encoding);
?>
xiaogil at yahoo dot fr
02-Aug-2005 10:33
Thanks Darien from /freenode #php for the following example (a little bit changed).
It just prints the 6th character of $string.
You can replace the digits by the same in japanese, chinese or whatever language to make a test, it works perfect.
<?php
mb_internal_encoding("UTF-8");
$string = "0123456789";
$mystring = mb_substr($string,5,1);
echo $mystring;
?>
(I couldn't replace 0123456789 by chinese numbers for example here, because it's automatically converted into latin digits on this website, look :
零一二三四
五六七八九)
gilv
drraf at tlen dot pl
23-Feb-2005 07:44
Note: If borders are out of string - mb_string() returns empty _string_, when function substr() returns _boolean_ false in this case.
Keep this in mind when using "===" comparisions.
Example code:
<?php
var_dump( substr( 'abc', 5, 2 ) ); // returns "false"
var_dump( mb_substr( 'abc', 5, 2 ) ); // returns ""
?>
It's especially confusing when using mbstring with function overloading turned on.
