首页 \ 问答 \ Rails：尽管使用UTF8编码序列化哈希的困境(Rails: encoding woes with serialized hashes despite UTF8)

Rails：尽管使用UTF8编码序列化哈希的困境(Rails: encoding woes with serialized hashes despite UTF8)

 我刚刚从ruby 1.9.2更新到ruby 1.9.3p0（2011-10-30修订版33570）。 我的Rails应用程序使用postgresql作为其数据库后端。 与数据库编码一样，系统区域设置为UTF8。 rails应用程序的默认编码也是UTF8。 我有中文用户输入中文字符和英文字符。 字符串存储为UTF8编码的字符串。  
 Rails版本：3.0.9  
 由于更新数据库中的某些现有中文字符串不再正确显示。 这不会影响所有字符串，但只会影响序列化哈希的一部分。 所有其他以纯字符串形式存储的字符串仍然显示为正确。  
 
 例：  
 这是一个序列化的散列，它以UTF8字符串形式存储在数据库中：  
broken = "--- !map:ActiveSupport::HashWithIndifferentAccess \ncheckbox: \"1\"\nchoice: \"Round Paper Clips  \\xEF\\xBC\\x88\\xE5\\x9B\\x9E\\xE5\\xBD\\xA2\\xE9\\x92\\x88\\xEF\\xBC\\x89\\r\\n\"\ninfo: \"10\\xE7\\x9B\\x92\"\n"
 
 为了将这个字符串转换为红宝石哈希，我使用YAML.load反序列化它：  
broken_hash = YAML.load(broken)
 
 这会返回一个包含乱码的内容：  
{"checkbox"=>"1", "choice"=>"Round Paper Clips  ï¼\u0088å\u009B\u009Eå½¢é\u0092\u0088ï¼\u0089\r\n", "info"=>"10ç\u009B\u0092"}
 
 乱码的东西应该是UTF8编码的中文。 broken_hash['info'].encoding告诉我，ruby认为这是#<Encoding:UTF-8> 。 我不同意。  
 有趣的是，所有其他没有序列化的字符串看起来不错，但是。 在同一个记录中，不同的字段包含看起来恰到好处的中文字符 - 在rails控制台，psql控制台和浏览器中。 每一个字符串---不管是序列化的散列还是普通的字符串---由于更新看起来都很好，所以保存到数据库中。  
 
 尽管ruby声称这已经是UTF-8，我试图将可能出错的编码（如GB2312或ANSI）中的乱码文本转换为UTF-8，当然我失败了。 这是我使用的代码：  
require 'iconv'
Iconv.conv('UTF-8', 'GB2312', broken_hash['info'])
 
 这个失败是因为ruby不知道如何处理字符串中的非法序列。  
 我真的只想运行一个脚本来修复所有旧的，大概破碎的序列化哈希字符串，并完成它。 有没有办法将这些断开的字符串转换成类似中文的东西？  
 
 我刚刚在原始字符串中编码了UTF-8字符串（在上面的例子中称为“破碎”）。 这是以序列化字符串编码的中文字符串：  
 chinese = "\\xEF\\xBC\\x88\\xE5\\x9B\\x9E\\xE5\\xBD\\xA2\\xE9\\x92\\x88\\xEF\\xBC\\x89\\r\\n\"  
 我注意到，很容易通过转义它来转换为真正的UTF-8编码字符串（删除转义反斜杠）。  
 chinese_ok = "\xEF\xBC\x88\xE5\x9B\x9E\xE5\xBD\xA2\xE9\x92\x88\xEF\xBC\x89\r\n"  
 这将返回一个正确的UTF-8编码的中文字符串： "（回形针）\r\n"  
 只有当我使用YAML.load(...)将字符串转换为红宝石哈希时，事情才会分崩离析。 也许我应该在输入YAML.load之前处理原始字符串。 只是让我想知道为什么这样...  
 
 有趣！ 这很可能是由于YAML引擎“心灵”在1.9.3中默认使用的。 我使用YAML::ENGINE.yamler = 'syck'切换到“syck”引擎，并且正确解析了断开的字符串。 

I've just updated from ruby 1.9.2 to ruby 1.9.3p0 (2011-10-30 revision 33570). My rails application uses postgresql as its database backend. The system locale is UTF8, as is the database encoding. The default encoding of the rails application is also UTF8. I have Chinese users who input Chinese characters as well as English characters. The strings are stored as UTF8 encoded strings. 
Rails version: 3.0.9 
Since the update some of the existing Chinese strings in the database are no longer displayed correctly. This does not affect all strings, but only those that are part of a serialized hash. All other strings that are stored as plain strings still appear to be correct. 
 
Example: 
This is a serialized hash that is stored as a UTF8 string in the database: 
broken = "--- !map:ActiveSupport::HashWithIndifferentAccess \ncheckbox: \"1\"\nchoice: \"Round Paper Clips  \\xEF\\xBC\\x88\\xE5\\x9B\\x9E\\xE5\\xBD\\xA2\\xE9\\x92\\x88\\xEF\\xBC\\x89\\r\\n\"\ninfo: \"10\\xE7\\x9B\\x92\"\n"
 
In order to convert this string to a ruby hash, I deserialize it with YAML.load: 
broken_hash = YAML.load(broken)
 
This returns a hash with garbled contents: 
{"checkbox"=>"1", "choice"=>"Round Paper Clips  ï¼\u0088å\u009B\u009Eå½¢é\u0092\u0088ï¼\u0089\r\n", "info"=>"10ç\u009B\u0092"}
 
The garbled stuff is supposed to be UTF8-encoded Chinese. broken_hash['info'].encoding tells me that ruby thinks this is #<Encoding:UTF-8>. I disagree. 
Interestingly, all other strings that were not serialized before look fine, however. In the same record a different field contains Chinese characters that look just right---in the rails console, the psql console, and the browser. Every string---no matter if serialized hash or plain string---saved to the database since the update looks fine, too. 
 
I tried to convert the garbled text from a possible wrong encoding (like GB2312 or ANSI) to UTF-8 despite ruby's claim that this was already UTF-8 and of course I failed. This is the code I used: 
require 'iconv'
Iconv.conv('UTF-8', 'GB2312', broken_hash['info'])
 
This fails because ruby doesn't know what to do with illegal sequences in the string. 
I really just want to run a script to fix all the old, presumably broken serialized hash strings and be done with it. Is there a way to convert these broken strings to something resembling Chinese again? 
 
I just played with the encoded UTF-8 string in the raw string (called "broken" in the above example). This is the Chinese string that is encoded in the serialized string: 
chinese = "\\xEF\\xBC\\x88\\xE5\\x9B\\x9E\\xE5\\xBD\\xA2\\xE9\\x92\\x88\\xEF\\xBC\\x89\\r\\n\" 
I noticed that it is easy to convert this to a real UTF-8 encoded string by unescaping it (removing the escape backslashes). 
chinese_ok = "\xEF\xBC\x88\xE5\x9B\x9E\xE5\xBD\xA2\xE9\x92\x88\xEF\xBC\x89\r\n" 
This returns a proper UTF-8-encoded Chinese string: "（回形针）\r\n" 
The thing falls apart only when I use YAML.load(...) to convert the string to a ruby hash. Maybe I should process the raw string before it is fed to YAML.load. Just makes me wonder why this is so... 
 
Interesting! This is likely due to the YAML engine "psych" that's used by default now in 1.9.3. I switched to the "syck" engine with YAML::ENGINE.yamler = 'syck' and the broken strings are correctly parsed.

原文：https://stackoverflow.com/questions/8558101

更新时间：2023-05-29 06:05

最满意答案

 您将需要支付每次调用的固定附加成本才能使用Java源代码中的OpenGL。 Android为这些调用提供了Java语言包装器。 例如，如果调用glDrawArrays ，则调用在android_opengl_GLES20.cpp中定义的GLES20.java中声明的本机方法。 你可以从代码中看到它只是以最小的开销转发呼叫。  
 在文件中徘徊，你可以看到其他调用执行额外的检查，因此稍微昂贵。  
 至于不可避免的成本，底线是使用Java源代码进行大量GLES调用的价格高于本地源代码。 （在使用systrace查看Android Breakout的性能时，我注意到驱动程序中有很多CPU开销，因为我正在执行大量冗余状态更新。从本机代码执行此操作的代价会更低，但是做零工作的成本低于做更少工作的成本。）  
 更深层次的问题与您是否需要如此不同地编写代码（例如为了避免分配）无法获得相同级别的性能有关。 你必须使用直接的ByteBuffer对象而不是简单的数组，这可能需要更多的管理。 但是除了目前计算密集型本机和Android上的Java代码之间的速度差异之外，我不知道任何从根本上阻止了严格Java实现的良好性能。 

You're going to be paying a fixed additional cost per call to use OpenGL from Java source code. Android provides Java-language wrappers around the calls. For example, if you call glDrawArrays, you're calling a native method declared in GLES20.java, which is defined in android_opengl_GLES20.cpp. You can see from the code that it's just forwarding the call with minimal overhead. 
Nosing around in the file, you can see other calls that perform additional checks and hence are slightly more expensive. 
The bottom line as far as unavoidable costs goes is that the price of making lots of GLES calls is higher with Java source than native source. (While looking at the performance of Android Breakout with systrace I noticed that there was a lot of CPU overhead in the driver because I was doing a lot of redundant state updates. The cost of doing so from native code would have been lower, but the cost of doing zero work is less than the cost of doing less work.) 
The deeper question has to do with whether you need to write your code so differently (e.g. to avoid allocations) that you simply can't get the same level of performance. You do have to work with direct ByteBuffer objects rather than simple arrays, which may require a bit more management on your side. But aside from the current speed differences between compute-intensive native and Java code on Android, I'm not aware of anything that fundamentally prevents good performance from strictly-Java implementations.

Rails：尽管使用UTF8编码序列化哈希的困境(Rails: encoding woes with serialized hashes despite UTF8)

最满意答案

相关问答

Android - 使用NDK的OpenGL ES 2.0教程？(Android - tutorials for OpenGL ES 2.0 using the NDK?)[2022-04-03]

OpenGL中的Android专用游戏：C ++（NDK）vs Java（Dalvik）的性能[关闭](Android only game in OpenGL: performance in C++ (NDK) vs Java (Dalvik) [closed])[2022-04-24]

Android NDK的依赖注入？(Dependency injection for Android NDK?)[2023-08-06]

Android NDK - 从其他线程加载纹理(Android NDK - Load texture from other thread)[2022-08-25]

Android NDK暂停/恢复的hello-gl2示例是否正确？(Is hello-gl2 example from Android NDK pause/resume correct?)[2021-12-02]

NDK不能用于什么？(What can't the NDK be used for?)[2022-10-09]

NDK如何在Android中工作 - 使用NDK，JNI等的命令是什么？(How does NDK work in Android - What is the order that NDK, JNI etc are used?)[2022-02-06]

Android NDK性能优于普通Java代码(Android NDK performance over regular Java code)[2024-03-30]

（二进制，NDK）C应用程序与Java应用程序（Dalvik字节码）的反编译(Decompilation of (binary,NDK) C apps vs. Java apps (Dalvik bytecode))[2022-12-25]

使用Android NDK C ++，inc。的系统覆盖窗口(System overlay window with Android NDK C++, inc. OpenGL-ES for graphics)[2023-12-06]

相关文章

最新问答