Discussion:
solr utf 16 ?
brian beard
2007-04-23 14:18:41 UTC
Permalink
Are there any plans to make solr UTF-16 compliant in the future?
If so, is it in the short-term or long-term?

_________________________________________________________________
MSN is giving away a trip to Vegas to see Elton John.  Enter to win today.
http://msnconcertcontest.com?icid-nceltontagline
Ken Krugler
2007-04-23 15:58:14 UTC
Permalink
Post by brian beard
Are there any plans to make solr UTF-16 compliant in the future?
If so, is it in the short-term or long-term?
I'm curious what you mean by "UTF-16 complaint". Do you mean being
able to handle UTF-16 encoded XML?

Thanks,

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"
brian beard
2007-04-23 16:34:30 UTC
Permalink
Yes. I'm assuming if you have UTF-16 encoded data in a document that needs
to be added to the index, that solr would not be able to handle this?
I'm curious what you mean by "UTF-16 complaint". Do you mean being able to
handle UTF-16 encoded XML?
_________________________________________________________________
Don’t quit your job – Take Classes Online and Earn your Degree in 1 year.
Start Today!
http://www.classesusa.com/clickcount.cfm?id=866146&goto=http%3A%2F%2Fwww.classesusa.com%2Ffeaturedschools%2Fonlinedegreesmp%2Fform-dyn1.html%3Fsplovr%3D866144
Ken Krugler
2007-04-23 17:24:18 UTC
Permalink
Post by brian beard
Post by Ken Krugler
I'm curious what you mean by "UTF-16 complaint". Do you mean being
able to handle UTF-16 encoded XML?
Yes. I'm assuming if you have UTF-16 encoded data in a document that
needs to be added to the index, that solr would not be able to
handle this?
I've never tried sending anything but UTF-8 to Solr, so I can't
comment on what issues you'll run into.

But based on my experience to date, I'd strongly suggest converting
it to UTF-8 before you post it to Solr.

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"
Mike Klaas
2007-04-23 18:13:54 UTC
Permalink
Post by brian beard
Yes. I'm assuming if you have UTF-16 encoded data in a document that needs
to be added to the index, that solr would not be able to handle this?
I believe that handling arbitrary encodings is on the list of future
enhancements, but I couldn't give you a timeline.

For the time being, consider that
1. utf-8 is the "lingua franca" of xml document encoding
2. it is very easy to convert it yourself (it would be a 3-4 line
python commandline filter, frinstance).

-Mike
brian beard
2007-04-23 20:23:04 UTC
Permalink
Thanks for all the comments. The conversion seems like a good alternative.
Subject: Re: solr utf 16 ?
Date: Mon, 23 Apr 2007 11:13:54 -0700
Post by brian beard
Yes. I'm assuming if you have UTF-16 encoded data in a document that needs
to be added to the index, that solr would not be able to handle this?
I believe that handling arbitrary encodings is on the list of future
enhancements, but I couldn't give you a timeline.
For the time being, consider that
1. utf-8 is the "lingua franca" of xml document encoding
2. it is very easy to convert it yourself (it would be a 3-4 line
python commandline filter, frinstance).
-Mike
_________________________________________________________________
Need a break? Find your escape route with Live Search Maps.
http://maps.live.com/?icid=hmtag3
Walter Underwood
2007-04-25 17:44:10 UTC
Permalink
UTF-16 support should not require any changes to the XML parsing.
All XML parsers are required to support that encoding. The real
change is implementing RFC 3023 (XML Media Types) so that the
encoding can be specified over HTTP.

wunder
Post by Mike Klaas
Post by brian beard
Yes. I'm assuming if you have UTF-16 encoded data in a document that needs
to be added to the index, that solr would not be able to handle this?
I believe that handling arbitrary encodings is on the list of future
enhancements, but I couldn't give you a timeline.
For the time being, consider that
1. utf-8 is the "lingua franca" of xml document encoding
2. it is very easy to convert it yourself (it would be a 3-4 line
python commandline filter, frinstance).
-Mike
Loading...