Wednesday, March 16, 2011

What is the deal with the unicode character 首(U+9996) and how java/mysql deal with it and its friends?

Man, this character encoding hole just keeps on getting deeper. Sigh. Ok. Check this out: I have a java String that contains the unicode character U+9996 (that's what I get if I do codePointAt()). If I look at it in the debugger expressions panel (in eclipse) then all is well and it looks like "首". However if I print it out to the console I get simply "?". It doesn't seem to be the font that's the problem as I've tried setting that differently.

My real problem is that I'm trying to put the string into a MySQL database (with utf8 encoding). Lots of other wide characters show up fine in the db but, again, this one and some others like it show up as "?". All of which leads me to believe that the problem is on the java side.

In chasing down this bug I've learnt a little about Unicode Normalization and java.text.Normalizer which looks like it might be relevant in this case. I've learnt that U+9996 is the canonical version of U+2FB8. U+2FB8 has exactly the same problems above though as regards display and anyway why would I want to transform to a non-canonical representation (even if I could, which I don't think I can)?

Anyway, there's one potential clue I've found which I've been unable to comprehend. This page contains the words "U+9996 is not a valid unicode character" with no further explanation. It then proceeds to show how to encode this supposedly non-valid unicode character in various unicode encodings. So my question is this basically: WTF?


UPDATES

  • I'm on a Mac.
  • I'm talking about the Eclipse console.
    • I set the console encoding to UTF-8 under Run > Common
    • I added -Dfile.encoding=UTF-8 to the JVM arguments (the default was MacRoman)
    • The console (Eclipse and Terminal.app) now show the right chars. Hooray!
  • I'm mostly interested in the data getting into the database correctly though of course I'd like to get a total understanding of what's going on here.
  • I think I've fixed the database problem. I forgot to set the encoding on the connection. Now I don't understand why some asian characters were getting through and not others.
  • Phew, stackoverflow moves fast. It's hard to keep up. Thanks people.
From stackoverflow
  • I don't know about the problems, but it's definitely a valid Unicode character (and has been since Unicode 1.1).

    1. What O/S is this running on?
    2. What console application is ie (xterm, cmd.exe, etc?)
    3. Is the console application set for UTF-8 output?

    Regarding 3 above, which is probably the important one, I've seen similar issues using e.g. PuTTY to talk to a Linux box, where the Linux box thought I was on UTF-8, but the PuTTY session itself was set to ISO-Latin-1 (8859-1)

    Yoni : In Eclipse you can set the enciding for the console, check out the preferences.
  • Have you verified that the value that gets stored in the database is actually U+003f (question mark)? There are all sorts of conventions for how to display characters that don't exist in the chosen font, and displaying them as ?' is fairly common.

    So most likely, the character gets stored correctly, and for whatever reasons, simply gets displayed as '?'. Basically, ignore how it gets rendered, and look at what codepoint gets stored in the database. Is it U+9996 or U+003f (or something else entirely)? Don't blindly assume that just because it gets rendered as a question mark, it is actually a question mark that is stored in the database.

    Rowan : How do I verify the value in the database is correct? I don't see a SQL function to show codepoints.
    Darryl Braaten : Read it back out with a java function and verify it at that point.

.net 2.0 security configuration

Are there some help resources, or can anyone give me a brief Idea how I would configure the .net 2 runtime security policies for the following scenario:

I have a windows forms control hosted in IE. The control tries to read from a serial port and write to the event log. Both of these operations fail due to security restrictions in the browser:

Request for the permission of type 'System.Security.Permissions.SecurityPermission,mscorlib, Version=2.0.0.0, Culture=neutral,PublicKeyToken=b77a5c561934e089 failed.

Request for the permission of type 'System.Diagnostics.EventLogPermission,System, Version=2.0.0.0, Culture=neutral,PublicKeyToken=b77a5c561934e089 failed.

I've set my site to be fully trusted by adding it to the list of fully trusted sites in IE, but I still have the problem. I am pretty sure the answer isin the runtime security policies in the .net 2.0 configuration but I just don't know what to change.

From stackoverflow
  • Since (I assume) you're running under IIS, you need to make sure that your I_USRmachinename account is in a security group that has the permissions you need, or that it is not in a group like "Guests," which would have restricted access to things like the system Event Log. Look at the permissions/groups for your I_USR, I_WAM, and ASP.NET accounts on the system you're trying to run on. I think that's where your problem lies.

    AJ : I appreciate the down-vote. Any feedback as to why I got it?
    Jeremy : PITADeveloper, Not sure abut the down vote, I didn't do it, but these permissions we're talking about would be ones on the client machine, what you're talking about are permissions on the server, which I don't think would solve my problem becase the code being restricted is running on the client.
    AJ : Ah, didn't realize that. Thank you for your insight. I just like to learn, that's all, and a downvote without comment doesn't teach me anything.
  • Try the following:

    Control Panel -> Administrative Tools -> Microsoft .NET Framework 2.0 Configuration

    Expand out and select:

    .Net Framework 2.0 Configuration -> MyComputer -> Runtime Security Policy -> Machine -> Code Groups -> All_Code

    Under that there are the different zones. I'm not sure which one would apply to IE and your scenario, but try the following on each of them (remember to set them back after testing)

    Right click on the "??????_Zone" Select the "Permission Set" Tab Change the "Permission Set" combo to "Full Trust"

    The "Membership Permission" tab may also lend some clues to which one is correct.

    Info on Code Access Security

    badbadboy : Wanted to write the same as you, but didn't find Microsoft .NET Framework 2.0 Configuration on my machine... :) weird..
    GSerg : Yeah, there's only Microsoft .NET Framework 1.1 Configuration despite I've got all of them installed and I'm an administrator. Would like to know the answer as well!
    Robert Wagner : Looks like you have a common problem. See http://home.hot.rr.com/graye/Articles/CodeAccessSecurity.htm#Install
    GSerg : >> Looks like you have a common problem Too common then. I've got VS2008 on my PC and there's no 2.0 Configuration applet, I've got VS2008 on work PC and it's no applet as well, and I've got VS2008 on yet another PC which didn't previously had VS2005 and still no applet. I'm confused.

Any downside to "WITH RECOMPILE" for monthly SQL Server stored proc processes?

I think the question says it all. I have several monthly processes in stored procedures which take anywhere from a minute to an hour. If I declare them WITH RECOMPILE, an execution plan will be generated each time.

If the underlying indexes or statistics or views are changed by the DBA, I don't want anyone to have to go in and force a recompile the SPs with an ALTER or whatever.

Is there any downside to this?

From stackoverflow
  • Under the circumstances, it would be completely harmless, and probably a good idea.

  • As I understand it, an SP should be re-compiled if needed automatically. So your concern about underlying changes doesn't really matter.

    However, the server tries to cache compiled SP plans. Using WITH RECOMPILE will free the memory that would have been used to cache the compiled procedures (at least until the next time the cache is cleared). Since they're only run monthly this seems like a good idea.

    Also, you might want to look at this article for other reason to use that directive:
    http://articles.techrepublic.com.com/5100-10878_11-5662581.html

    Cade Roux : What if there are newer, better indexes it could use, or if there were new statistics - will the old plans be automatically discarded?
    Jim McLeod : No, the only time a plan with be invalidated is when indexes or statistics used by the plan are modified. See the topic "Execution Plan Caching and Reuse" (look up "procedure cache" in the index), section "Recompiling Execution Plans" in Books Online for more details about what invalidates plans.
  • If each stored procedure is only run once per month, it is highly unlikely that the compiled procedure will still be in the procedure cache. Effectively it will be recompiling anyway.

    Even if you run the same stored procedure 100 times on your reporting day, it will only take 0-2 seconds to compile each time (depending on the complexity of the stored procedure), so it's not a massive overhead. I'd feel comfortable setting WITH RECOMPILE on those stored procedures.

    Cade Roux : Right - I'm thinking the recompile time is always neglible compared to the run time - and the run time limits me from running these processes so frequently that it could even be a factor.

Event handlers inside a Javascript loop - need a closure?

I'm working with a bit of html and Javascript code that I've taken over from someone else. The page reloads a table of data (via an asynchronous request) every ten seconds, and then re-builds the table using some DOM code. The code in question looks something like this:

var blah = xmlres.getElementsByTagName('blah');
for(var i = 0; i < blah.length; i++) {
    var td = document.createElement('td');
    var select = document.createElement('select');
    select.setAttribute("...", "...");
    select.onchange = function() {
        onStatusChanged(select, callid, anotherid);
    };
    td.appendChild(select);
}

When the onchange event is fired for a <select> element however, it seems like the same values are being passed to the onStatusChanged() method for every <select> in the table (I've verified that in each iteration of the loop, callid and anotherid are being given new, distinct values).

I suspect this is occuring because of the nature of how I am setting the event handler, with the select.onchange = function() syntax. If I understand how this is working correctly, this syntax sets a closure for the onchange event to be a function which refers to these two references, which end up having a final value of whatever they are set to on the last iteration of the loop. When the event fires, the value referenced by callid and anotherid is the value set in the last iteration, not the value set at the individual iteration.

Is there a way that I can copy the value of the parameters I am passing to onStatusChanged()?

I've changed the title to better reflect the question and the accepted answer.

From stackoverflow
  • You do, indeed, need to implement a closure here. This should work (let me know - I didn't test it)

    var blah = xmlres.getElementsByTagName('blah');
    for(var i = 0; i < blah.length; i++) {
        var td = document.createElement('td');
        var select = document.createElement('select');
        select.setAttribute("...", "...");
        select.onchange = function(s,c,a)
        {
            return function()
            {
                onStatusChanged(s,c,a);
            }
        }(select, callid, anotherid);
        td.appendChild(select);
    }
    
    matt b : this does indeed work - thanks!
    matt b : Also another solution I discovered - a workaround, really - would be to store the callid and anotherid as attributes in the DOM element, and inside the eventhandler just reference those via this.getAttribute()
    le dorfier : I wouldn't call it a workaround. What I would expect is to have the handler set once (not every time through the loop) and have it discover the variables itself.
  • This is the right answer. Had a similar issue and the closure solution was the ticket.

Class usage in ORM based projects

This question is about "best use" scenarios in projects with ORMs like NHibernate, Subsonic, Linq2SQL, etc...

All of these tools generate basic entity classes, some with attributes, others without. Do people use these classes as their business classes? Or is there wholesale copying of data from ORM generated classes to manually created business classes?

Thanks.

From stackoverflow
  • I tend to work the opposite way. I create the business objects the way I need them, and create NHibernate mappings from my objects to data. You can have NHibernate generate a schema for you based on your mappings, or you can create your own schema, and create the mappings to go between the two. Linq2Sql and Entity Framework do not support this. I can't speak to Subsonic on this point.

    I usually create my business classes, and get the application at least partially running without any database at all. This way I can develop a better understanding of what it is the application is supposed to do, and how it should behave before making the decision of how to persist the objects.

    Justice : This is the way to go. In fact, the whole point of OOP and ORM is so that you are free to work with business objects (the domain model)! You can use tools like NHibernate to persist your business objects transparently, but your application code does not need to deal at all with persistence.
  • There are a couple of solutions for all of the tools mentioned, the answer is it depends on the scope of your project.

    Here's a similar question about LINQ to SQL that I answered a little while ago.

    Hope that helps!

  • I normally use the entities directly in the business and presentation layers. The data layer defines the entity, the business layer manipulates the entity or queries a list of entities, and the presentation layer displays the entities.

    I think that creating separate business objects and copying data between the two would be a lot of unneeded overhead. But if you find that you have to do this, I'd recommend just wrapping the entities instead of copying data back and forth. You can hide the entity and use Properties to expose members and alter behavior.

  • SubSonic and Linq2Sql are one-to-one orm mapper. Consider the situation where the datbase is normalized. For example an employee info is broken down in 3 different tables but in your domain model you'd want only one object Employee represent the info. This is where SubSonic and Linq2Sql fail. NHibernate allows you to map your domain object to multiple tables. Also you'd want to stay away from auto generated code. NHibernate allows you to define your own POCO (Plain old C# object) domain and has different ways that allows us to map that to table(s) in database

    Ahmad : Correct Shiraz. To isolate yourself from one-to-one mapping, youl would have to use SQL and abstract yourself at that level.

Programmaticly use Gmail to receive e-mail?

I'd like to use a C# program to poll a gmail account and automatically download new messages. I know you can use gmail as an outbound SMTP server, but is there any way to access new messages sent to the account?

EDIT: Thanks for the rapid feedback....so I have two options, POP or IMAP.

Which one should I use? And why?

EDIT #2: Looks like IMAP allows me to not have to poll. Looks like the way to go.

From stackoverflow
  • You can configure GMail to let you get at your mail with a POP3 client.

  • Use Gmail with IMAP.

    Matt Cruikshank : Sure - but with what libraries does Jonathan use IMAP?
    FlySwat : I think I'll use this one: http://www.codeplex.com/InterIMAP
    Martin Vobr : What about this one? Shows how to get list of unread messages (IMAP, VB.NET) http://blog.rebex.net/news/archive/2007/05/28/howto-get-list-of-unread-messages-from-an-imap-server-in-vb-net.aspx Shows how to get message list from POP3 server (C#) http://blog.rebex.net/news/archive/2007/05/14/howto-download-emails-from-gmail-account-in-csharp.aspx
  • By the way, there is an instruction how to use SMTP with SSL with GMail: http://www.mono-project.com/FAQ:_Security#Does_SSL_works_for_SMTP.2C_like_GMail_.3F

  • You can get an Atom Feed of your GMAIL, which can be fetched with a regular web request, and parse as a regular XML document. I made a PHP page I could access from my phone (which doesn't support Atom Feeds, or pages as advanced as Google Mobile), to show me a list of new emails.

    FlySwat : Link is broken, but cool idea.
  • There is a C# component on SourceForge that lets you do exactly that. At some point I wrote an NT Service that runs in the background and downloads files from a gmail account.

How to access the Index Of A Generic.List By Reflection??

ok, ive a class and i pass an object as property.

the object that i pass is a List<X>

in my class im trying to access the Object index by reflection BUT I CAN'T!!!

Example:

this class works i just wrote down the part i want to show you and i need help.

class MyClass
{
    private object _recordSet;
    public object RecordSet
    {
        get { return _recordSet; }
        set { _recordSet = value; }
    }

    public string Draw()
    {
        system.reflection.Assembly asem = system.reflection.Assembly.getAssembly(_dataSource.GetType());

        object instance;

        instance = asem.CreateInstance(_dataSource.GetType().UnderlyingSystemType.FullName);

        //to access de Count of my List
        int recordcount = int.Parse(_dataSource.GetType().GetProperty("Count").GetValue(_dataSource,null));

        //i need to do a 
        for(int cont = 0; cont < recordCount; cont++)
        {
            _dataSource[cont].Name; // <-- THIS PART IS NOT WORKING!!! because i cant access the Index Directly.... WHAT TO DO!! ???
        }
    }
}
From stackoverflow
  • If you are using reflection (and hence lots of object), why not just cast as an IList (non-generic) instead?

    i.e.

    IList list = (IList)actualList;
    object foo = list[17];
    

    Also - for your original code with Count, you don't mean int.Parse - you should just cast (since we expect Count to be an int).

    Jeff B : Add: if (actualList is IList)
    Marc Gravell : @Jeff B - I disagree; the scenario suggests that we expect it to be a list, hence if the data *isn't* an IList, I'm happy for it to raise an exception. It depends on the scenario, of course; if it was ad-hoc data-binding to either an object, an IList or an IListSource, then "as"/"is" is necessary.
  • Just cast your object to a list first, you don't need reflection here.

  • when i try to create a ILIST object i get a compilation exception:: using generic type 'system.collections.generic.ilist' requires 1 type argument.

    @Marc Gravell it can be a ILIST because i just want to move inside the records

    PD

    Sorry about my english...

    Marc Gravell : As I tried to stress, you want the *non-generic* IList, not IList
    Marc Gravell : i.e. `System.Collections.IList`, not `System.Collections.Generic.IList`
    Marc Gravell : add "using System.Collections;" to the top of the file
  • IList newlist = (IList)_dataSource;

    object x = newlist[1];

    it give to me a compilation exception...

  • Hey hey It works i was missing this part :P System.Collections.IList thanks dude!!