转载

itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset

itpub论坛问题

    原问题:http://www.itpub.net/thread-1943065-1-4.html
   安装了数据库11.2.0.4,误将字符集设置为AL32UTF8,本来应该是ZHS16GBK;通过应用软件录入了一些数据,直到一个窗口报错才发现字符集选择错了。
因为AL32UTF8是ZHS16GBK的超集,所以直接将字符集转换为ZHS16GBK后,应用软件进去后,所有中文都变成了乱码,看来直接转换字符集这条路走不通了。


现在想问,有什么办法能将AL32UTF8库里数据导出,导入到新安装的ZHS16GBK库中?
在AL32UTF8库导入前将导入环境字符集设置为ZHS16GBK?然后导出?

分析思路

itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset

扩展问题

1,对于数据库字符集没有完全理解,还是浮于表面
2,有空多看官方手册 Oracle? Database Globalization Support Guide



结论

1,不同的地理位置,适用的数据库字符集不同,引申含义即比如我们是中国就应采用这个地理位置的字符集
2,如果原数据库字符集与目标数据库字符集不是父子关系,运行转化数据库字符集会提示如下错误
SQL> ALTER DATABASE CHARACTER SET ZHS16GBK;
ALTER DATABASE CHARACTER SET ZHS16GBK
*
ERROR at line 1:
ORA-12712: new character set must be a superset of old character set


3,父子集数据库字符集转换不会造成数据丢失及存储空间增大2种问题,中文字符不会在转换显示乱码,比如:?之类的
4,字符集AL32UTF8与ZHS16GBK不是父子集数据库字符集关系,所以肯定不能转换,也就回答了坛友的问题了,在字符集转换后肯定会发生乱码,只有父子集数据库字符集方可正常转换
    所以一定要在创建数据库前把数据库字符集考虑好,一旦有了业务数据,再转换就风险极大了
5, convert函数可以转换指定字符由某数据库字符集转换为另一种数据库字符集,用这个函数我认为最大价值在于预知某些字符是否可以正常转换,即产生数据损失即产生乱码
   也可以发现空间增加的问题
itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset

6, v$nls_valid_values可以获取数据库字符集的信息,共计 499种字符集
7,数据库字符集编码方案共计2大类:
   单字节及多字节,其中又细分很多子类,请见下测试


测试

---查看与nls相关的字典
SQL> select table_name from dict where lower(table_name) like '%nls%';


TABLE_NAME
------------------------------------------------------------
NLS_DATABASE_PARAMETERS
NLS_INSTANCE_PARAMETERS
NLS_SESSION_PARAMETERS
V$NLS_PARAMETERS
V$NLS_VALID_VALUES
GV$NLS_PARAMETERS
GV$NLS_VALID_VALUES


7 rows selected.


SQL> desc nls_database_parameters;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 PARAMETER                                 NOT NULL VARCHAR2(30)
 VALUE                                              VARCHAR2(40)


SQL> select count(*) from nls_database_parameters;


  COUNT(*)
----------
        20


当前数据库字符集是WE8MSWIN1252
SQL> col parameter for a30
SQL> col value for a30
SQL> select parameter,value from nls_database_parameters;


PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CHARACTERSET               WE8MSWIN1252
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_SORT                       BINARY
NLS_TIME_FORMAT                HH.MI.SSXFF AM


PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY              $
NLS_COMP                       BINARY
NLS_LENGTH_SEMANTICS           BYTE
NLS_NCHAR_CONV_EXCP            FALSE
NLS_NCHAR_CHARACTERSET         AL16UTF16
NLS_RDBMS_VERSION              11.2.0.1.0


20 rows selected.




---官方手册说,如果数据库端字符集不能包括了客户端的字符集,则会发生数据丢失,即产生乱码,说白了就是用乱码替换识别到的字符集,所以一定要确认数据库字符集是客户端
字符集的父集
Figure 2-6 shows that data loss occurs when the database character set does not include all of the characters in the client character set. 
The database character set is US7ASCII. The client's character set is WE8MSWIN1252, and the language used by the client is German. When the 
client inserts a string that contains ?, the database replaces ? with ?, resulting in lost data.

itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset


---基于上述信息,查阅如下可以获取官方推荐的基于不同全球的数据库字符集


Oracle? Database Globalization Support Guide
11g Release 2 (11.2)
Part Number E10729-07


A Locale Data


Recommended Database Character Sets
itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset

itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset

itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset

由这里可见我当前选择的数据库字符集WE8MSWIN1252 明显不正常的,因为这个字符集适用于欧洲,或者我应该选择支持粒度更好的AL32UTF8

再摘录下父子集数据库字符集列表(源自官方手册)
itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset

itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset

itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset

---基于问题说的构建一个字符集AL32UTF8的数据库
SQL> select parameter,value from nls_database_parameters where parameter='NLS_CHARACTERSET';


PARAMETER                                          VALUE
-------------------------------------------------- ------------------------------
NLS_CHARACTERSET                                   AL32UTF8


--插入中文数据
SQL> conn scott/system
Connected.
SQL> create table t_charset(a varchar2(100));


Table created.


SQL> insert into t_charset values('我们');


1 row created.


SQL> commit;


Commit complete.


SQL> col a for a30
SQL> select a,dump(a) from t_charset;


A                              DUMP(A)
------------------------------ --------------------------------------------------
我们                           Typ=1 Len=9: 233,142,180,230,136,156,230,187,145




修改数据库字符集AL32UTF8 为ZHS16GBK ,验证ITPUB网友的问题


SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.


Total System Global Area 1653518336 bytes
Fixed Size                  2213896 bytes
Variable Size             956303352 bytes
Database Buffers          687865856 bytes
Redo Buffers                7135232 bytes
Database mounted.
SQL> ALTER SYSTEM ENABLE RESTRICTED SESSION;


System altered.


SQL> ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;


System altered.


SQL> ALTER SYSTEM SET AQ_TM_PROCESSES=0;


System altered.


SQL> alter database open;


Database altered.


这里也可以再次验证ZHS16GBK不是AL32UTF8的超集
SQL> ALTER DATABASE CHARACTER SET ZHS16GBK;
ALTER DATABASE CHARACTER SET ZHS16GBK
*
ERROR at line 1:
ORA-12712: new character set must be a superset of old character set




SQL> ALTER DATABASE character set INTERNAL_USE ZHS16GBK;


Database altered.






SQL> select parameter,value from nls_database_parameters where parameter='NLS_CHARACTERSET';


PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_CHARACTERSET               ZHS16GBK


[oracle@seconary ~]$ sqlplus scott/system


SQL*Plus: Release 11.2.0.1.0 Production on Sun Nov 29 06:26:20 2015


Copyright (c) 1982, 2009, Oracle.  All rights reserved.


ERROR:
ORA-01035: ORACLE only available to users with RESTRICTED SESSION privilege




SQL> alter system disable restricted session;


System altered.




检查中文字符报错
SQL> conn scott/system
Connected.
SQL> select a,dump(a) from t_charset;
SP2-0784: Invalid or incomplete character beginning 0x91 returned


SQL> select count(*) from t_charset;


  COUNT(*)
----------
         1




SQL> select dump(a) from t_charset;


DUMP(A)
--------------------------------------------------------------------------------
Typ=1 Len=9: 233,142,180,230,136,156,230,187,145


SQL> insert into t_charset values('别人');


1 row created.


SQL> commit;


Commit complete.


SQL> select * from t_charset;


A
--------------------------------------------------------------------------------
别人
SP2-0784: Invalid or incomplete character beginning 0x91 returned


--可见zhs16gbk每个中文3个字节,而AL32UTF8是4.5个字节,而且可见如果旧数据库字符集和新字符集不是父子关系,可能会产生乱码
SQL> select a,dump(a) from t_charset;


A                                                  DUMP(A)
-------------------------------------------------- --------------------------------------------------
别人                                              Typ=1 Len=6: 229,136,171,228,186,186
我们                                               Typ=1 Len=6: 230,136,145,228,187,172
SP2-0784: Invalid or incomplete character beginning 0x91 returned






再把字符集恢复回AL32UTF8
SQL> select parameter,value from nls_database_parameters where parameter='NLS_CHARACTERSET';


PARAMETER                                          VALUE
-------------------------------------------------- --------------------------------------------------
NLS_CHARACTERSET                                   AL32UTF8




经查官方手册,数据库字符集转换有如下2种风险:
Potential data loss
Increased overhead




SQL> truncate table t_charset;


Table truncated.


SQL> select a,dump(a) from t_charset;


no rows selected




测试下父子集数据库字符集切换基于中文,会不会产生乱码
SQL> ALTER DATABASE character set INTERNAL_USE  US7ASCII;


Database altered.




SQL> select parameter,value from nls_database_parameters where parameter='NLS_CHARACTERSET';


PARAMETER                                          VALUE
-------------------------------------------------- --------------------------------------------------
NLS_CHARACTERSET                                   US7ASCII




可见us7ascii数据库字符集不能正常显示中文字符,一查官方手册,原来这个字符集是隶属于欧洲,当然不能显示亚洲的字符
SQL> insert into t_charset values('一');


1 row created.


SQL> commit;


Commit complete.




SQL> select a,dump(a) from t_charset;


A          DUMP(A)
---------- ------------------------------
??         Typ=1 Len=2: 63,63




SQL> truncate table t_charset;


Table truncated.


测一个亚洲中文的字符集,也是显示不行中文 CGB2312-80 16-bit Simplified Chinese
SQL> ALTER DATABASE character set INTERNAL_USE  ZHS16CGB231280;


SQL> insert into t_charset values('一');


1 row created.


SQL> commit;


Commit complete.


SQL> select a,dump(a) from t_charset;


A          DUMP(A)
---------- ------------------------------
?          Typ=1 Len=3: 228,184,63


Database altered.


继续测试下UTF8数据库字符集,可见如果由子集转化为父集数据库字符集,不会发生数据丢失及空间增加的2种问题
SQL> ALTER DATABASE character set INTERNAL_USE UTF8;


Database altered.


SQL> insert into t_charset values('伙伴');


1 row created.


SQL> commit;


Commit complete.


SQL> select a,dump(a) from t_charset;


A          DUMP(A)
---------- --------------------------------------------------
伙伴       Typ=1 Len=9: 230,181,188,230,172,142,229,141,179






SQL> conn /as sysdba
Connected.
SQL> alter system enable restricted session;


System altered.


SQL> ALTER DATABASE character set AL32UTF8;


Database altered.


SQL> alter system disable restricted session;


System altered.


SQL> conn scott/system
Connected.
SQL> select a,dump(a) from t_charset;


A          DUMP(A)
---------- --------------------------------------------------
一         Typ=1 Len=6: 230,182,147,226,130,172
伙伴       Typ=1 Len=9: 230,181,188,230,172,142,229,141,179




--查看所有的数据库字符集
SQL> select count(*) from v$nls_valid_values;


  COUNT(*)
----------
       499




SQL> select parameter,value from v$nls_valid_values where parameter='CHARACTERSET' and VALUE  like '%ZH%';


PARAMETER                                          VALUE
-------------------------------------------------- --------------------------------------------------
CHARACTERSET                                       ZHS16CGB231280
CHARACTERSET                                       ZHS16MACCGB231280
CHARACTERSET                                       ZHS16GBK
中间略
CHARACTERSET                                       ZHT16BIG5FIXED


22 rows selected.




SQL> select parameter,value from v$nls_valid_values where parameter='CHARACTERSET' and VALUE  like '%AL%';


PARAMETER                                          VALUE
-------------------------------------------------- --------------------------------------------------
CHARACTERSET                                       AL24UTFFSS
CHARACTERSET                                       AL32UTF8
CHARACTERSET                                       




SQL> select convert('ABC','AL16UTF16','UTF8'),length(convert('ABC','AL16UTF16','UTF8')) from dual;


CONVERT('ABC LENGTH(CONVERT('ABC','AL16UTF16','UTF8'))
------------ -----------------------------------------
 A B C                                               6




SQL> select convert('a123','UTF8','UTF8') from dual;


CONVERT(
--------
a123


SQL> select length('12'),length(convert('12','AL16UTF16','UTF8')) from dual;


LENGTH('12') LENGTH(CONVERT('12','AL16UTF16','UTF8'))
------------ ----------------------------------------
           2                                        4

数据库字符集的编码规则 共计2大类(单字节虽然效率高且空间占用小,但表示的字符有限):
Single-Byte Encoding Schemes


     7-bit encoding schemes


     8-bit encoding schemes


Multibyte Encoding Schemes


    Fixed-width multibyte encoding schemes


   Variable-width multibyte encoding schemes


   Shift-sensitive variable-width multibyte encoding schemes


Naming Convention for Oracle Database Character Sets
[S|C]


The parts of the names that appear between angle brackets are concatenated. 
The optional S or C is used to differentiate character sets that can be used only on the server (S) or only on the client (C).

itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset

正文到此结束
Loading...