Tuesday, April 5, 2011

BeautifulSoup 3.1 parser breaks far too easily

I was having trouble parsing some dodgy HTML with BeautifulSoup. Turns out that the HTMLParser used in newer versions is less tolerant than the SGMLParser used previously.


Does BeautifulSoup have some kind of debug mode? I'm trying to figure out how to stop it borking on some nasty HTML I'm loading from a crabby website:

<HTML>
    <HEAD>
        <TITLE>Title</TITLE>
        <HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
    </HEAD>
    <BODY>
        ...
        ...
    </BODY>
</HTML>

BeautifulSoup gives up after the <HTTP-EQUIV...> tag

In [1]: print BeautifulSoup(c).prettify()
<html>
 <head>
  <title>
   Title
  </title>
 </head>
</html>

The problem is clearly the HTTP-EQUIV tag, which is really a very malformed <META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"> tag. Evidently, I need to specify this as self-closing, but no matter what I specify I can't fix it:

In [2]: print BeautifulSoup(c,selfClosingTags=['http-equiv',
                            'http-equiv="pragma"']).prettify()
<html>
 <head>
  <title>
   Title
  </title>
 </head>
</html>

Is there a verbose debug mode in which BeautifulSoup will tell me what it is doing, so I can figure out what it is treating as the tag name in this case?

From stackoverflow
  • Your problem must be something else; it works fine for me:

    In [1]: import BeautifulSoup
    
    In [2]: c = """<HTML>
       ...:     <HEAD>
       ...:         <TITLE>Title</TITLE>
       ...:         <HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
       ...:     </HEAD>
       ...:     <BODY>
       ...:         ...
       ...:         ...
       ...:     </BODY>
       ...: </HTML>
       ...: """
    
    In [3]: print BeautifulSoup.BeautifulSoup(c).prettify()
    <html>
     <head>
      <title>
       Title
      </title>
      <http-equiv>
      </http-equiv>
     </head>
     <body>
      ...
            ...
     </body>
    </html>
    
    
    In [4]:
    

    This is Python 2.5.2 with BeautifulSoup 3.0.7a — maybe it's different in older/newer versions? This is exactly the kind of soup BeautifulSoup handles so beautifully, so I doubt it's been changed at some point… Is there something else to the structure that you haven't mentioned in the problem?

    Mat : I've got Python 2.5.1 and BeautifulSoup 3.1.0.1. The original broken structure was different, but the problem also occurs with the simplified structure in the question. I have just run the code in your example and have the same problem as before, nothing after . Now I'm really confused!
    ShreevatsaR : One possibility is that BeautifulSoup broke something when updating... did you try with the text copied exactly from your question here?
    ShreevatsaR : http://www.crummy.com/software/BeautifulSoup/CHANGELOG.html BeautifulSoup 3.1 is based on HTMLParser rather than SGMLParser (as the latter is gone in Python 3.0), which *might* be the problem here. That's sad...
    Mat : Yes I tried with the exact text in the question, and I've just copied-and-pasted again to be sure. Sounds like a pain that the parser has changed. Perhaps I should drop a quick regular expression in to zap the borked HTML. It's not like I'm going to come across anything similar elsewhere.
    John Fouhy : Confirming ShreevatsaR's results with BeautifulSoup 3.0.7a...
  • Having problems with Beautiful Soup 3.1.0? recommends to use html5lib's parser as one of workarounds.

    #!/usr/bin/env python
    from html5lib import HTMLParser, treebuilders
    
    parser = HTMLParser(tree=treebuilders.getTreeBuilder("beautifulsoup"))
    
    c = """<HTML>
        <HEAD>
            <TITLE>Title</TITLE>
            <HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
        </HEAD>
        <BODY>
            ...
            ...
        </BODY>
    </HTML>"""
    
    soup = parser.parse(c)
    print soup.prettify()
    

    Output:

    <html>
     <head>
      <title>
       Title
      </title>
     </head>
     <body>
      <http-equiv="pragma" content="NO-CACHE">
       ...
            ...
      </http-equiv="pragma">
     </body>
    </html>
    

    The output shows that html5lib hasn't fixed the problem in this case though.

  • Try lxml (and its html module). Despite its name, it is also for parsing and scraping HTML. It's much, much faster than BeautifulSoup, and it even handles "broken" HTML better than BeautifulSoup. It has a compatibility API for BeautifulSoup too if you don't want to learn the lxml API.

    Ian Blicking agrees.

    There's no reason to use BeautifulSoup anymore, unless you're on Google App Engine or something where anything not purely Python isn't allowed.

Oracle 10g - An invisible column?

Hello,

Is it possible to 'hide' a column in an Oracle 10g database, or prevent data from being inserted altogether? We'd prefer to have existing INSERT statements still function, but have the information NOT insert for a particular column. Also, column masking or any type of encryption is not preferred.

Not sure if it's possible, but thanks in advance.

From stackoverflow
  • With Oracle feature virtual private database (VPD) you can define which users can change and which users can select a column. Virtual private database is also called fine-grained access control (FGAC).

  • If all you need to do is stop data from being inserted, you could create a BEFORE INSERT FOR EACH ROW trigger that wipes out any value inserted into the column before the row is saved.

    There are various other things you can do with security (also views) to prevent inserts/selects from particular users in particular circumstances, but these will probably not let existing inserts continue to work.

    : This is most likely the route we would like to take... Do you know if the original INSERT event would be logged, or just the trigger event? Thanks!
    Eric Rosenberger : Depends what you mean by "logged"... As far as the transaction log goes, it should only be a single INSERT with the values as modified by the trigger (i.e., there wouldn't be two entries), but I suppose a lower-level audit log or something might record the original version somewhere...
  • Rename original table, create view with original table name but only selecting the columns you want to show.

    Recompile code referring to existing table.

  • what about just setting your grants to each item you allow to update or insert

    grant select on emp to scott;

    grant update (column1, column2), insert (column1) on emp to scott

Initialize library on Assembly load

I have a .net library dll that acts like a functional library. There are a bunch of static types along with static methods.

There is some initialization code that I need to run to set up the library ready for use.

When the assembly gets loaded is there a way to ensure that a particular method is run? Something like AppDomain.AssemblyLoad but called automatically from the assembly itself. I was thinking that maybe there is something like an AssemblyAttribute that could be used?

At the moment I have this initialization code in a static constructor but as this is a library with many entry points there is no guarantee that this particular type will be used.

Thanks!

From stackoverflow
  • Why do you need all the data to be loaded before any of it is used, rather than just when the first type which needs it is used?

    I don't believe there's any way of forcing a method to be run on assembly load, from within the assembly. You could put a static constructor in every type, but frankly I think it just makes more sense to have a single type representing that data and providing access to it - and put a static constructor on that type alone. (If you've got separate bits of data which can be used independently, perhaps create separate types for them.)

    sixtowns : I didn't mention loading any data. I really just want to make sure that the library uses a specific DateTime converter instead of the default one. (See http://stackoverflow.com/questions/458935/extend-a-typeconverter).
    sixtowns : So I just want to run this code once when the assembly is loaded: TypeDescriptor.AddAttributes(typeof(DateTime), new TypeConverterAttribute(typeof(DateTimeConverterEx)));
    Jon Skeet : So put a static constructor in every type which uses the converter. It's unfortunate, but that's the price to be paid for static state :(
    Jon Skeet : (Apologies for assuming you meant "load" by "initialize" btw.)
    sixtowns : I thought this might be the case. The question title was misleading because it said 'data' - I've changed it. Cheers.

How do I deploy a pre-compiled ASP.NET web application?

I have a web service implemented in ASP.NET 2.0 and have pre-compiled it using the aspnet_compiler.

I have no clue now how to deploy it to IIS, can someone point me in the right direction? I am using IIS 6.0 on a Windows Server 2003 machine.

I have placed the pre-compiled files into a virtual directory, when I access the service through the browser I get the following parser error message:

Parser Error Message: The page must have a 
<%@ webservice class="MyNamespace.MyClass" ... %> directive.

I then checked the .asmx file for said directive and the contents have been changed by the compiler to:

This is a marker file generated by the precompilation tool, and should not be deleted!

UPDATE: When I place the non-pre-compiled web app in the virtual directory it works fine, when I place the pre-compiled wen app in the virtual directory, I get the above errors.

Anyone have any ideas!?

From stackoverflow
  • You can either copy the contents of the output directory into the virtual directory on IIS and it should all just work. You will need to set up the App Pool and configure the virtual directory for ASP.NET just like a non-compiled website. Make sure you delete everything from the virtual directory (if there was a previous non-compiled site there).

    Also you can use Web Deployment projects, which can serve as an input to an MSI installer.

    The ASMX file contents are correct. Are you sure you have configured the Virtual Directory with an Application? Open the properties window by right clicking on the virtual directory and ensure the "Application Name" (on the "Virtual Directory" tab) is set and is editable.

    Also have chosen the correct ASP.NET version on the tab? Check this on the "ASP.NET" tab of the properties window"

    mmattax : I revised my original post, I have already tried placing the pre-compiled app into a virtual directory...
    mmattax : I believe the virtual directory is set up correctly, it will serve the non-pre-compiled application fine, I just wanted to try to deploy a pre-compiled version...
    mmattax : Not sure if it matters, but the virtual directory is being run under a .NET 1.0 application, but I have the virtual directory set to 2.0.
    Robert Wagner : That could be it, try it. The deploy process for a non-compiled project sets up the Application Pool,etc for you. Try deploying the non-compiled first, delete the files then copy the new ones in.
  • did you deploy the entire bin directory? You need to make sure the .COMPILED files created by aspnet_compiled are in your bin dir.

asp.net custom datapager

in all the datapager examples i've seen (using a LinqDataSource tied to a ListView) the queries to the database return the full recordset. how can i get JUST the page of rows that i want to display?

for example, if i have a table with 1million rows and i want to see the 10 results on page 5, i would be getting rows 51 to 60.

i'm sure i have to implement custom paging but i haven't found any good examples.

From stackoverflow
  • If you're using MSSql2005, take a look at this article.

    As you can see, the trick is to use the function ROW_NUMBER(), that allow you to get the sequential number of a row in a recordset. With it you can simply enable pagination based upon the number of rows you want to get in a page.

  • There are many ways of doing this, however, I personally like a SQL based solution that goes to the database and gets the result set. This 4GuysFromRolla Article does a good job of explaining it.

  • I was under the impression (from Scott Guthie's blog post "LINQ to SQL (Part 9)") that the LinqDataSource handles the paging for you at the database level:

    One of the really cool things to notice above is that paging and sorting still work with our GridView - even though we are using a custom Selecting event to retrieve the data. This paging and sorting logic happens in the database - which means we are only pulling back the 10 products from the database that we need to display for the current page index in the GridView (making it super efficient).

    (original emphasis)

    If you are using some custom paging, you can do something like this in LINQ to SQL:

    var tagIds = (from t in Tags where tagList.Contains(t.TagText) select t.TagID).Skip(10).Take(10).ToList();
    

    This is telling LINQ to take 10 rows (equivalent to T-SQL "TOP 10"), after skipping the first 10 rows - obviously these values can be dynamic, based on the Page Size and page number, but you get the idea.

    The referenced post also talks about using a custom expression with a LinqDataSource.

    Scott has more information on Skip/Take in Part 3 as well.

iPhone Programming: Send a text message? Access contact list?

I'm still new to the API and I wanted to ask:

  • Can you send a text message programmatically?
  • Can you access the users contact list programmatically?

I'm thinking no. I haven't seen anything about text messaging in the API, and I figure the sandboxing that the iPhone does keeps you away from the phones contact list.

Thanks everyone.

From stackoverflow
  • 1) I'm afraid you can't send SMS with the iPhone SDK although you can make a link to send an SMS like so: <a href="sms:408-555-5555">408 555 5555</a>

    2) You can access the contact list with the Address Book UI framework

    David Grant : You can also programmatically open the Text (SMS) by passing a sms:// URL to UIApplication's openURL method.
  • one way round the sms/mms problem is to use an external aggregator then you can utilise http between a server and the iphone to send sms obviously there is a cost involved this way to the developer

  • external aggregator then you can utilise http

    You mean my app would use http://MySite.com?from=&to=&msg=

    And then my server-side code would send the actual text-message?

    Has anyone come up with a more "direct" method?

    Sneakyness : It's not happening. It just isn't. Read through all of Apple's documentation on getting started with iPhone programming and you'll understand why.
  • Yes You can build an app to send SMS. All u need a server API, which u gonna call through ur code, to send sms. I am saying this bcoz i am currently working on it.

    As soon as i m done with my app,i am gonna share it. http://shishir.com?from=shishir&to=shishir&message=hi&sandbox=false&username=user&password=pass

    u hv to pass values through URL.

    will go in deep very soon.

    regards shishir

Displaying messages on form success page

Assume you have page A which is the "home" page for the web app. Assume there is a second page B which contains a form. After successfully processing the form the user is directed back to page A. If you need to display a success message for the previous action (the successful form submission), what is the best way to get that message for display?

I've narrowed it down to this:

  1. Pass a message key to page A. Page A will then use the key to get the message from somewhere.

  2. Pass the message to page A. However this seems to open the site up for XSS and what not.

  3. When processing the form store the message in session scope prior to redirecting to page A. Then page A can retrieve & remove the message from session and display it on the screen.

Am I missing something? What is the preferred way to accomplish this task?

From stackoverflow
  • I usually use method 3. If a page wants to display a message after redirect, it sets a session variable. Then code, that is in my base class (executed for every page requests), checks to see if there is a message to display, displays it and empties the message session variable.

  • I would never use a session for such a task. It's irresponsible and destroys the flow of logic. Instead, you could have a pre-determined list of errors and just pass the error code through the query parameters. If you really need to send new, custom data every time I would suggest sending it through a GET or a POST preferably.

    oneBelizean : How is it irresponsible? And how does it destroy logic anymore than retrieving the message with the key passed in on the url??
    Joe Philllips : If the user goes to that page directly the message will still be there most likely. I'm not sure what this page is so it's hard to say what problems you will encounter.
    oneBelizean : The only way the message would still be there is if the user didn't complete the redirect to the page. Assuming the web server handles distributing session state properly among multiple webservers (if they exist)
  • totally agreeing with d03boy here for all the same reasons. Storing data specific to a certain view in the session breaks badly the moment your users start to open multiple windows.

    Personally, I always use method 1 you've described here.

    oneBelizean : Following each successful submission the user is redirected, so having multiple windows open wouldn't necessarily be a problem. However managing the session across multiple servers may be more of an issue.
  • I agree with d03boy and pilif : method 3 is not a good use of the session and would be messed up in case of several windows, and as you said, method 2 opens to XSS.

    Store the different messages either in a file or a database, and pass the key to the script. If you need to customize your messages, pass the data through a post request (and validate it to prevent XSS) and use patterns to replace the values in the message.

Visual Studio 2005 stopped adding code-behind files

This morning, when I tried to add a new ASPX page to my project, Visual Studio decided that I no longer needed any .CS files associated with it. Trying to add a web control produced same results: .ascx file with no .cs. I've got two questions so far:

  1. Considering that no changes have been made to the system over the weekend, what could be the cause of this?
  2. Is re-installing VS the only option right now?

I'm running Visual Studio 2005 SP1 on Windows XP SP3.

Thanks!

EDIT: Thank you all. The checkbox DID get unchecked at some point and I simply did not see it. I will blame this one on Monday...

From stackoverflow
  • That can be considered a feature by the ASP.NET MVC folks.

    le dorfier : Right! So what's the problem again?
  • Usually, there is a check box asking if you want to have the code in a separate file. Did this box get unchecked at some point (and so it's now the default)? It's easy to get in the habit of clicking through these common dialogs and not noticing that it may have changed.

  • I don't mean to be rude by asking an obvious question, but have you made sure that the "place code in a separate file" check box is checked when you create your page?

  • there is a check box you may have accidentaly un-checked: Place code in separate file
    dialog

Odd Autocomplete + Password Remembering Behaviour

I'm having a hard time figuring out how Firefox and Chrome determining what fields are for a password and how they autocomplete them on other forms.

For my login form, I have this:

<p>
    <label for="login_email">Email:</label><br />
    <input id="login_email" name="login[email]" size="30" type="text">
</p>

<p>
    <label for="login_password">Password:</label><br />
    <input id="login_password" name="login[password]" size="30" type="password">
    <input id="login_password_hash" name="login[password_hash]" type="hidden">
</p>

<p>
    <input id="login_submit" value="LOGIN" type="submit">
</p>

The login_password_hash field is for hashing the password on client side before sending using Javascript, having Javascript disabled does not change the outcome.

And for creating a new user, I have this form:

<p>
    <label for="user_email">Email:</label>
    <input id="user_email" name="user[email]" size="30" type="text">
</p>
<p>
    <label for="user_first_name">First Name:</label>
    <input id="user_first_name" name="user[first_name]" size="30" type="text">

</p>
<p>
    <label for="user_last_name">Last Name:</label>
    <input id="user_last_name" name="user[last_name]" size="30" type="text">
</p>
<p>
    <label for="user_password">Password:</label>
    <input id="user_password" name="user[password]" size="30" type="password">
    <input id="user_password_hash" name="user[password_hash]" type="hidden">
</p>

<p><input id="user_submit" value="Create User" type="submit"></p>

Now after saving the password from the login form and visiting the new user form, the saved email is put in the last field before the password field and puts the password in the password field.

This happens on Firefox and Chrome, but not in Internet Explorer. Any ideas on why Firefox and Chrome behave this way? The two forms have nothing in common, the names and ids are all different.

From stackoverflow
  • Well all browsers are going to pick up the most obvious field names:

    • firstname
    • lastname
    • username
    • password

    And the obvious alternates of those (underscores for spaces, and some shorthand such as "user" and "pass" maybe). It's going to be up to the browser itself and what version of it as to what crazy alternates are supported.

    Samuel : Changing both the id and name attributes on both email and password fields to complete gibberish still has Firefox and Chrome autocomplete both forms.
  • If you want to disable autocomplete on an input, try using the autocomplete attribute, e.g.:

    <input type="text" name="email" autocomplete="off" />
    

    This is not guaranteed to work in all browsers, however.

    Samuel : I want autocomplete for the login page, but thanks for the suggestion.
  • Seems that changing name and id attributes has no effect on the password managers in Firefox and Chrome, all they see is a password_field and the field above it and that is good enough for them.

    So I'm just making the password field in create a new user as a text field.

    kripto_ash : This seems to be a known issue. http://code.google.com/p/chromium/issues/detail?id=1854

Searching word in vim?

I can search word in vim with /word. How can I exclude searches for word1, word2 and only search for word.

From stackoverflow
  • like this:

    /\<word\>
    

    \< means beginning of a word, and \> means the end of a word,

    Adding @Roe's comment:
    VIM provides a shortcut for this. If you already have word on screen and you want to find other instances of it, you can put the cursor on the word and press '*' to search forward in the file or '#' to search backwards.

    roe : beaten by 19 seconds :) +1
    roe : placing your cursor over a word and pressing * (forward) or # (backward) is a shortcut
    Nathan Fellman : yes, I tried to be fast :)
  • I usually google my trusty vim cheat sheets whenever I go a long time without using it.

    http://www.viemu.com/a_vi_vim_graphical_cheat_sheet_tutorial.html http://www.tuxfiles.org/linuxhelp/vimcheat.html

XmlDocument dropping encoded characters

My C# application loads XML documents using the following code:

XmlDocument doc = new XmlDocument();
doc.Load(path);

Some of these documents contain encoded characters, for example:

<xsl:text>&#10;</xsl:text>

I notice that when these documents are loaded, &#10; gets dropped.

My question: How can I preserve <xsl:text>&#10;</xsl:text>?

FYI - The XML declaration used for these documents:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
From stackoverflow
  • Are you sure the character is dropped? character 10 is just a line feed- it wouldn't exactly show up in your debugger window. It could also be treated as whitespace. Have you tried playing with the whitespace settings on your xmldocument?


    If you need to preserve the encoding you only have two choices: a CDATA section or reading as plain text rather than Xml. I suspect you have absolutely 0 control over the documents that come into the system, therefore eliminating the CDATA option.

    Plain-text rather than Xml is probably distasteful as well, but it's all you have left. If you need to do validation or other processing you could first load and verify the xml, and then concatenate your files using simple file streams as a separate step. Again: not ideal, but it's all that's left.

    Brian Singh : With `PreserveWhitespace = true;` I see it in the debug window (inner xml) and also when the file is saved but it is unencoded. My app is an intermediary; it combines a number of smaller xml documents into a single larger document so I need to preserve the encoded form.
    Brian Singh : I would just append them all together if there were not business requirements that dictate modifications to the smaller xml as the larger is constructed.
    Jon Skeet : I don't see why you need to preserve the encoded form - every XML parser should treat the two as being the same. Could you explain the requirement in more detail?
    Brian Singh : Joel – Correct. I have no control over the input documents. I am leaning towards using file streams and regular expressions to achieve what I need to do.
    Brian Singh : Jon – The purpose of my application is to automate a manual process done by the front-end team (creators of the XSLTs). I am taking files as input and generating files as output.
    Brian Singh : From the front-end team perspective I am simply building a larger XSLT using their existing smaller XSLTs and applying some rules on what gets included in the final file.
    Brian Singh : They are determining the correctness of the results based on the input they had provided me and it makes them very nervous when things are missing ( is only one example)
    Joel Coehoorn : I advise against regular expressions. They're just not suited for evaulating this kind of document. The two step approach, while perhaps slower, will result in cleaner, simpler, more maintainable code.
  • &#10; is a linefeed - i.e. whitespace. The XML parser will load it in as a linefeed, and thereafter ignore the fact that it was originally encoded. The encoding is just part of the serialization of the data to text format - it's not part of the data itself.

    Now, XML sometimes ignores whitespace and sometimes doesn't, depending on context, API etc. As Joel says you may find that it's not missing at all - or you may find that using it with an API which allows you to preserve whitespace fixes the problem. I wouldn't be at all surprised to see it turned into an unencoded linefeed character when you output the data though.

    Brian Singh : Yes - It is indeed an unencoded linefeed character once the data is outputed - unfortunately I need to keep the encoded form.
    U62 : Does doc.PreserveWhitespace = True; help?
    bobince : No, it won't. A conforming XML processor may not distinguish between a newline character and a character reference to code 10 in element content, full stop. (It's different in attribute values.) Why do you need to keep the encoded form?
    Brian Singh : bobince - see comments section in Joel Coehoorn answer
  • maybe it would be better to keep data in ![CDATA] ?

    http://www.w3schools.com/XML/xml_cdata.asp

How do I dynamically access request parameters with JSP EL?

I'm looping through a list of items, and I'd like to get a request parameter based on the item's index. I could easily do it with a scriptlet as done below, but I'd like to use expression language.

<c:forEach var="item" items="${list}" varStatus="count">

   <!-- This would work -->
   <%=request.getParameter("item_" + count.index)%>

   <!-- I'd like to make this work -->
   ${param.?????}

</c:forEach>
From stackoverflow
  • <c:set var="index" value="item_${count.index}" />
    ${param[index]}
    

    Unfortunately, + doesn't work for strings like in plain Java, so

    ${param["index_" + count.index]}
    

    doesn't work ;-(

    ScArcher2 : i corrected the reference to params. it's supposed to be param. But your answer gave me what I needed to get it working. Thanks!
  • There is a list of implicit objects in the Expression Language documentation section of the J2EE 1.4 documentation. You're looking for param.

    ScArcher2 : Thanks I looked it up and realized I was accessing the wrong thing. The main thing I was missing was the bracket syntax for accessing a property.
  • You just need to use the "square brackets" notation. With the use of a JSTL <c:set> tag you can generate the correct parameter name:

    <c:forEach var="item" items="${list}" varStatus="count">
      <c:set var="paramName">item_${count.index}</c:set>
      ${param[paramName]}
    </c:forEach>
    

How can I find out how many rows a MySQL query returns in Java?

How can I find out how many rows a MySQL query returns in Java?

From stackoverflow
  • From the jdbc faq:

    .18. There is a method getColumnCount in the JDBC API. Is there a similar method to find the number of rows in a result set?

    No, but it is easy to find the number of rows. If you are using a scrollable result set, rs, you can call the methods rs.last and then rs.getRow to find out how many rows rs has. If the result is not scrollable, you can either count the rows by iterating through the result set or get the number of rows by submitting a query with a COUNT column in the SELECT clause.

  • I don't think you can, except maybe by calling ResultSet.last() and then ResultSet.getRow() - but I don't know if that will actually work. I've always just processed each row at a time, and counted them afterwards.

  • If you use a CachedRowSet you can know how many rows your statement returned and iterate then both ways, forward and backwards, with the drawback that the full rowset must be placed in memory instead of being fetched dynamically. here more info.

PHP sessions with HTML

I have a website which uses PHP and HTML pages, I want to create a session which stores a username from the login page. But the login pages are php and the next pages are html.

Is this a problem or can I just add a small statement of php into the html page saying

 <?PHP session_start();
$_session['loginid']=$_post['username'];
?>

Or am I doing it wrong?

This is the first time i've used sessions and they confuse me a little.

thanks for any help.

From stackoverflow
  • As the sessions are handled by PHP, it needs PHP to maintain the state. You need at least session_start() to use the session variables stored in $_SESSION.

  • You can't put php into .html files without playing around with your server's configuration files. You should only put php into .php files.

    If you have a lot of .html files, you can simply rename them to .php files. It's okay to put pure html into something.php. So, you should make sure that all of your files end with .php, and then you can put any session logic you want into them.

  • You are trying to share a PHP session variable with a page that is of type text/html. As you suggested you must make the HTML page a PHP page for this to work and add a little snippet of PHP somewhere to display the user name.

    Change your HTML page to PHP. At the top of the page add something like this:

    <?php
      session_start(); // must be before any output
      $username = $_SESSION['username']; // or whatever you called it
      // check that $username is valid here (safe to display)
    ?>
    html here
    Hello <?= $username ?>!
    
    Pim Jager : Note that this requires short-tags, for maximum support use: in the last line. Also as said, make sure to clean HTML tags from $username there. Otherwise they'll be shown in your page (XSS)
  • If you have access to your apache configuration, or a simple .htaccess file, you can tell Apache to handle php code inside of an .html file. You can do this by creating an .htaccess file (remember the . (dot) as the first character in that filename) on the document root of the site (probably public_html/) and putting this into it:

    # Add this to public_html/.htaccess file
    AddHandler application/x-httpd-php .html
    AddHandler application/x-httpd-php .htm
    

    You should be able to reload the html page and your PHP code (from Michael Matthews answer) will run great.

    TravisO : Of course don't forget that parsing every HTML page with the PHP engine is a minor slowdown, but probably not one you'd ever notice.
    localshred : I agree, though it is only a slow down if you have any pages on the site that truly are static. Otherwise, it's the exact same thing as doing everything under a .php extension.

Reading and editing HTML in .Net

Is there a .Net class for reading and manipulating html other than System.Windows.Forms.HtmlDocument.

If not, are there any open source libraries for this.

From stackoverflow
  • I would do something like this if it XHTML compliant:

    System.Xml.XmlDocument xDoc = new System.Xml.XmlDocument();
    xDoc.LoadXml(html);
    

    And edit it that way. If it needs some cleaning up(XHtml Conversion) you can use HtmlTidy or Ntidy. Additionally, you can use this HTMLTidy wrapper example below:

    string input = "<p>broken html<br <img src=test></div>";
    HtmlTidy tidy = new HtmlTidy()
    string output = tidy.CleanHtml(input, HtmlTidyOptions.ConvertToXhtml);
    XmlDocument doc = new XmlDocument();
    doc.LoadXml(output);
    

    StackOverFlow Reference

    EDIT above will be converted to XHtml

    ChrisW : Surely that only works with XHTML: not with HTML.
    cgreeno : Y is this down voted? Is it not a valid option????
    hmcclungiii : I'd imagine it was down voted because the question had nothing to do with XML.
    cgreeno : YES but the question asks for other OPTIONS on how to manipulate HTML! XHTML is just a reformulation of HTML in XML.
    Cyril Gupta : I don't think it deserves a down vote. so I voted it up.
    hmcclungiii : Then he'll fall into the trap of XML validation among many other things, that I'd guess by his wording would be way more than he is bargaining for. Instead of manipulating straight HTML, you would suggest he "reformulate" it? Sorry, I just don't agree, and I think your CAPS are a bit rude.
    cgreeno : Reformulating it? XHtML is valid HTML as well.... so by turning HTML to XHTML you would not only be manipulating the required data but outputting something better.... You may not agree, but it is a valid option.
    hmcclungiii : Oh, I didn't down vote it. Without knowing exactly what his purposes are, I would say that XHTML is overkill, to put it more simply.
  • Why does you like not System.Windows.Forms.HtmlDocument and Microsoft.mshtml ?

    mdresser : Because it requires a reference to System.Windows.Forms which isn't so appropriate for a class library or for asp.net.
  • You could use the MSHTML library. However, it is COM/ActiveX, but if you are using Visual Studio, it will create a managed wrapper for you automatically.

    ChrisW : Is the (unmanaged) MSHTML library the same thing as the (managed) System.Windows.Forms.HtmlDocument?
    ChrisW : I assumed that HtmlDocument is a managed wrapper around the unmanaged MSHTML ... you're saying this isn't so?
  • you can always use the LiteralControl:

    PlaceHolder.Controls.Add(new LiteralControl("<div>some html</div>"));
    
  • It seems that the best option for parsing Html in .Net apps is to use the Html Agility Pack library found on codeplex. This provides full DOM access to the HTML and is very straightforward to use.

Good .Net Remoting Tutorial/Guide

Does anyone have any links to a good, concise, Tutorial or guide to .Net remoting? Would be grateful!

Thanks (sorry for the short post!)

Edit: I am now aware of: http://www.codeproject.com/KB/XML/remotingsimpleeng.aspx

From stackoverflow
  • Nothing beats Ingo Rammer's book on .NET Remoting from Apress http://www.amazon.com/Advanced-NET-Remoting-Ingo-Rammer/dp/1590590252

    Bogdan Gavril : Yup, I learned .net remoting from this book as well.
    Charles Bretana : Excellent, the bible for remoting...
  • Unless you have some specific requirements that require .NET Remoting you should probably take a look at Windows Comminication Foundation (WCF) instead. WCF provides the functionality of .NET remoting (and more) and as far as I know WCF is now the recommended technology to use.

    Terry Donaghe : Unless he has a corporate requirement not to use anything new. A lot of places aren't using WCF yet because they have myopic IT rules, etc.
    Damien : Yeah, Developing in .Net 2.0 otherwise I would use WCF
  • Here are my bookmarks on .NET Remoting articles: http://delicious.com/dreikanter/remoting. the most detailed manual is "Remoting Architecture in .NET". Other ones contains different usage examples of this technology.

Whats your favorite OS Power tool?

Whats your favorite power tool that helped you the most? OS doesn't matter.

Please note this is a community wiki. I would like this list to help all of us in finding more powerful tools.

If possible, list down one tool (or multiple tools for multiple os) per answer. This will make voting for tools easier

From stackoverflow
  • Launchy

  • For windows - Process Monitor, Process Explorer

  • RapidEE for (finally!) painless environment variable management on Windows

  • Does cygwin count? Being able to "grep" on Windows is HUGE.

    Jonathan Leffler : Yes - see my proposal (Perl and Shell - same reasoning).
    Tomalak : There is a native port of grep for Win32, along with many other gnu utils. Look for GnuWin32 on sourceforge.
  • So many to choose from...

    • Perl (because I've been using it for a long time and haven't migrated to Python)
    • Shell (Bourne/Korn/POSIX/Bash)

    Both count in my book for generality and availability.

  • Powershell

  • I would have to say one of the following:

    • GCC
    • Text processing utilities (sed/awk/grep/tail)
    • Perl/Shell scripting
    • On MySQL servers mytop to view the queries that are happening
    • For debugging both gdb and ktrace/strace

    I am a programmer and power tools to me may not be power tools to you! I like writing my own when I need them as well.

  • Berkeley Utilities

  • On Linux (and many other Unix) systems strace. It allows me to debug system administration problems by looking at a command's interface to the operating system. It works on any binary (doesn't require compiling with debug symbols), and is especially useful if I don't (or can't be bothered to) have access to the command's source code. It can pinpoint configuration errors (where is this program looking for its input files?), uncover undocumented features (why is the program trying to open this file?), and performance problems (the command is writing to the file a character at a time). Similar tools are ltrace (Linux; works at the level of library calls), truss (FreeBSD and Solaris), and, of course, dtrace (Solaris, Mac OS X, and FreeBSD).

  • Bash. It does so much stuff well that would be cumbersome in a GUI.

  • Perl , of course :)

  • UltaMon (for Windows) . I love being able to have my task bar stretch across all three monitors and easily move apps from one monitor to another without having to minimize and drag.

    On the Macintosh it would have to be TextMate.

  • SlickRun

    And TCPView from SysInternals. Much easier that continually running "netstat -ban" in a shell.

  • Autoruns from Sysinternals (Mark Russinovich and Bryce Cogswell). An essential cleanup tool.

  • For Windows, Far Manager + lots of plugins.

  • UltraMon, as mentioned above, and PowerMenu, which allows me to set any window to be Always On Top, or semi transparent, and really helps with laying things out on the screen.

    • Gnome-DO
    • APT (Advanced Package Tool on Debian and Ubuntu)
  • I can't imagine life without GNU screen. There are so many features that just make things a bit easier. Stuff like being able to detach a session and then attach to from any computer anywhere, with all programs still running. Or searching through the scroll history. Or running several screens in the same terminal window. And lots of other stuff.

  • Mac: Quicksilver Win: Slickrun

  • Windows Key + R = Instant access to everything. In addition, I create batch files in the windows directory so I could do tasks that usually takes multiple commands with one command.

    Eclipse : This has kind of been replaced in Vista by just Winkey. I have a folder in the start menu now that just consists of shortcuts to batch files that gets picked up by the search.
    MrValdez : I haven't tried Vista but thanks for the heads up. I'll be keeping an eye out for this feature once Windows 7 is released (I'm hearing good things about it. Hopefully, once I get familar with it, it will make me switch permanently from XP)
  • Dexpot + Clipx + Winsplit revolution + Input Director

    Multiple desktops, clipboard history, fast & easy window positioning and using networked computer screens as simply as a multi-monitor setup.

    And launchy, but has already been mentioned.

    And Vim, of course.

  • Ditto

    Ditto is an extension to the standard windows clipboard. It saves each item placed on the clipboard allowing you access to any of those items at a later time. Ditto allows you to save any type of information that can be put on the clipboard, text, images, html, custom formats, .....

  • For Mac OS X: QuickSilver

  • The Everything desktop search tool. It's face-meltingly good. Fast, flexible, unthinkably small (the complete download is 334Kb!)

    The first thing I do on a new computer is install this and Autohotkey, and remap Win+F to run it.

ASPNET MVC: Way to figure out the route of the referer (sic)?

I have some POST actions on my controller that are hit from a pair of GET actions. When validation fails, I want to render the view of the action that the POST is coming from. For example:

~/accounts POSTs to ~/accounts/disable - render "index" view on validation error ~/accounts/profile POSTs to ~/accounts/disable - render "profile" view on validation error

I can get the referer (sic) out of server-variables and parse it to figure out the action, but was hoping there would either be something built in that does what I want, or someone else has already done this that I could crib from.

It seems the ControllerContext.RouteData property only has information about the current request, not the refering (sic) request...?

I'm on ASP.NET MVC beta.

From stackoverflow
  • MVC Contrib includes an extension method on String to get a route (specifically, in here). In theory this would let you do

    RouteData referrer = Request.UrlReferrer.PathAndQuery.Route();
    

    or, of course, wrap that up in an extension method on HttpRequest so you can just do

    RouteData referrer = Request.ReferrerRoute();
    

    However, it's intended for unit testing and uses Rhino Mocks to create the HttpContextBase that GetRouteData() requires. The good news is that it seems to get away with mocking very little of it, so you might be able to avoid Rhino Mocks if you create your own FakeHttpContext class derived from HttpContextBase.

Is it possible to use a primitive type (int) in as a generic type in Java?

Specifically, with a SortedMap<Vector<String>, int> I get "dimensions expected after this (int) token." Help!

From stackoverflow
  • No, this is not possible. Use Integer instead. Autoboxing takes care of the rest (i.e. for most purposes you can program as if you had actually used int because Java converts to and from Integer automatically for you).

    willcodejavaforfood : he was just after the badge
    Konrad Rudolph : @willcode…: You figured me out. :-/ If not for the badge, I would have given a wrong, thoroughly misleading answer that would have led the OP on a merry goose chase through the internet. Now, quick, get off the Internet. It's dangerous out here.
    willcodejavaforfood : If not for the badge you would not have posted the question at all I suspect :)
    Konrad Rudolph : @willcode…: I have *not* posted the question. I've just corrected the formatting.
  • If you knew that why did you ask the question? :)

    Jason Coco : He didn't ask the question, he just made some edits to it for readability ;)
    willcodejavaforfood : @Jason - OK that makes more sense :)
  • Konrad is correct. Alternately, you can use the trove class TObjectIntHashMap to map Objects to primitive ints.