search
top

JSP/Java – strip_tags() PHP like function

Another PHP function that is very used is strip_tags.
This function tries to return a string with all HTML tags stripped from a given string.

   1:      public static String strip_tags(String text, String allowedTags) {
   2:          String[] tag_list = allowedTags.split(",");
   3:          Arrays.sort(tag_list);
   4:   
   5:          final Pattern p = Pattern.compile("<[/!]?([^\\\\s>]*)\\\\s*[^>]*>",
   6:                  Pattern.CASE_INSENSITIVE);
   7:          Matcher m = p.matcher(text);
   8:   
   9:          StringBuffer out = new StringBuffer();
  10:          int lastPos = 0;
  11:          while (m.find()) {
  12:              String tag = m.group(1);
  13:              // if tag not allowed: skip it
  14:              if (Arrays.binarySearch(tag_list, tag) < 0) {
  15:                  out.append(text.substring(lastPos, m.start())).append(" ");
  16:   
  17:              } else {
  18:                  out.append(text.substring(lastPos, m.end()));
  19:              }
  20:              lastPos = m.end();
  21:          }
  22:          if (lastPos > 0) {
  23:              out.append(text.substring(lastPos));
  24:              return out.toString().trim();
  25:          } else {
  26:              return text;
  27:          }
  28:      }

5 Responses to “JSP/Java – strip_tags() PHP like function”

  1. You have a mistake with a variable name on line 13. It should be changed to this:
    if(Arrays.binarySearch(tag_list, tag) < 0) {

    It’s also worth noting that this implementation is case sensitive.

  2. Additionally, two backslashes (\\) need to be added before each occurrence of “s” in the regex pattern.

  3. Remus Stratulat [Member] says:

    Thank you for your observations. I’ve corrected them out.

  4. this should work as well.

       1:      public static String stripTags(String text, String allowed) {
       2:          String pattern = "";
       3:          String[] allow = allowed.split(",");
       4:          if (allow.length > 0) {
       5:              StringBuffer sb = new StringBuffer();
       6:              sb.append(allow[0]);
       7:              for (int i = 1; i < allow.length; i++) {
       8:                  sb.append("|");
       9:                  sb.append(allow[i]);
      10:              }
      11:              pattern = "";
      12:          } else {
      13:              pattern = "";
      14:          }
      15:          return text.replaceAll(pattern, "");
      16:      }    
  5. At a first glance it seems this method is removing all allowed tags and keeps the others and that is opposite of what it should do.

    Please tell me if I am wrong.

Leave a Reply

top