» » PHP and UTF-8

 

PHP and UTF-8

Author: bamboo06 on 14-08-2019, 18:43, views: 3023

0
When working with the Unicode character set, use the corresponding function instead of the native string function. For example, if a file is encoded as UTF-8 PHP code, if the strlen() function is wrong, use the mb_strlen() function instead.
PHP and UTF-8


PHP handles UTF-8
PHP has mbstring extensions, comes with several string interception functions, which are commonly used are substr and mb_substr.
When working with the Unicode character set, use the corresponding function instead of the native string function. For example, if a file is encoded as UTF-8 PHP code, if the strlen() function is wrong, use the mb_strlen() function instead.
If you use substr to intercept Chinese characters, garbled characters will appear, because substr is intercepted in bytes. That is, UTF-8 encoded Chinese, using substr interception, will only intercept 1/3 Chinese, of course, garbled. When UTF-8 is encoded, a Chinese character is 3 bytes. Similarly, different languages in other countries will produce multi-byte strings, so we use mb_strlen() to handle them.
If you don't know the encoding format of the string, you can check it with mb_detect_encoding.
If you want to convert the encoding, use the function iconv(), such as GB2312 to UTF-8:
iconv("GB2312","UTF-8",$text);

The mb_convert_encoding function is available when you are unable to determine what encoding the original encoding is, or if it cannot be displayed properly after iconv conversion.
$str = mb_convert_encoding($str, "UTF-8");

In fact, there is an option in php.ini to set default_charset = "UTF-8";, many string processing functions such as htmlentities() will use this default character set.
We can use the header() function to explicitly specify the character set. In the response returned by PHP, the Content-Type header also uses this value by default.
header("Content-Type: text/html;charset=utf-8");

Of course, we should also add this mata tag to the header of the HTML document:
<meta charset="UTF-8" />


PHP browser download file name garbled
Although we use UTF-8 for both php code and output data, we may encounter a lot of surprises. For example, the pit I encountered, when downloading a file from the server, IE browser, Edge will be garbled when downloading the Chinese file name, and when the file name contains a null character, the downloaded file name space becomes The + sign and so on.
In fact, when PHP downloads files through the header() function, it also needs to consider the browser and operating system (most people use Windows). For Chrome, the output file name encoding can be UTF-8, Chrome will automatically Convert the file name to GBK encoding.
For IE, it inherits the operating system environment, so if the download file name is Chinese, it must be transcoded to UTF-8 encoding, otherwise the user will see the garbled file name when downloading. The solution has:
$agent=$_SERVER["HTTP_USER_AGENT"];if(strpos($agent,'MSIE')!==false ){
    $filename = iconv("UTF-8","GBK","chi.pdf");
    header("Content-Disposition: attachment; filename=\"$filename\"");
}


PHP handles MySQL garbled
First make sure your MySQL is UTF-8. Then Mysql client also maintains UTF-8 when connecting, specifically to PHP, is mysqli or PDO extended connection Mysql when setting UTF-8 as the connection code, the two sides are consistent, generally will not encounter garbled problems.
For example, when using PDO to link Mysql, use the UTF-8 character set:
$pdo->query('SET NAMES utf8;');


Sum up
1. Be sure to know the character encoding of the data when processing multi-byte strings;
2. Use UTF-8 character encoding to store data;
3. Use UTF-8 character encoding to output data.

Tags: php, utf8

Category: PHP Scripts

Dear visitor, you are browsing our website as Guest.
We strongly recommend you to register and login to view hidden contents.
Information
Comment on the news site is possible only within (days) days from the date of publication.