How do I handle unicode user input in Scala safely esp XML entities

De openkb
Aller à : Navigation, rechercher

Sommaire

Questions

On my website I have a form that takes in some textual user input. All works fine for "normal" characters. However when unicode characters are input... well, the plot thickens.

User inputs something like

やっぱ死にかけてる

This comes in to the server as text containing XML entity refs

やっぱ死にかけてる?

Now, when I want to serve this back to the client in HTML, how do I do it?

If I simply output the string as it is, there could be a chance for a script attack. If I try to encode it with scala.xml.Text it gets converted to:

やっぱ死にかけてる?

Is there a better ready-made solution in Scala which can detect entity refs and not escape them, yet escape XML tags?

Answers

Parse the string containing entity references as a fragment of XML. To safely output the Unicode characters in XML, you can be paranoid and use XML entity references for them, as per the function escape

scala>import xml.parsing.ConstructingParser                                                             
import xml.parsing.ConstructingParser

scala>import io.Source                                                                                  
import io.Source

scala> val d = ConstructingParser.fromSource(Source.fromString("<dummy>&#12420;</dummy>"), true).documnent
d: scala.xml.Document = <dummy>や</dummy>

scala>val t = d(0).text                                                                                         
res0: String = や

scala> import xml._
import xml._

scala> def escape(xmlText: String): NodeSeq = {
     |   def escapeChar(c: Char): xml.Node =
     |     if (c > 0x7F || Character.isISOControl(c))
     |       xml.EntityRef("#" + Integer.toString(c, 10))
     |     else
     |       xml.Text(c.toString)
     | 
     |   new xml.Group(xmlText.map(escapeChar(_)))
     | }
escape: (xmlText: String)scala.xml.NodeSeq

scala> <foo>{escape(t)}</foo>                            
res3: scala.xml.Elem = <foo>&#12420;</foo>

Source

License : cc by-sa 3.0

http://stackoverflow.com/questions/2033833/how-do-i-handle-unicode-user-input-in-scala-safely-esp-xml-entities

Related

Outils personnels
Espaces de noms

Variantes
Actions
Navigation
Outils