Discussion:
results for giant mass-check (phew)
Justin Mason
2002-08-20 12:46:26 UTC
Permalink
Here we go.....

top tests listed first, worst tests last. I've basically done a little bit of
reordering (e.g., moving compensation tests that we *know* are extremely
low-frequency into the "safe" zone), and the tests I want to drop are the ones
after the --------END OF GOOD TESTS--------- line.

(I plan to either comment, or move the tests to an 'attic' file in CVS btw, so
rehabilitators who think they can up hitrates after the 2.40 release are
welcome.)

That leaves 708 tests, and drops about 200 tests.

comments? protests? ;)

--j.


OVERALL% SPAM% NONSPAM% S/O SCORE NAME
307257 115762 191495 0.38 0.00 (all messages)
100.000 37.676 62.324 0.38 0.00 (all messages as %)
3.777 0.028 6.043 0.00 -100.00 USER_IN_WHITELIST
22.947 54.626 3.796 0.94 1.53 CLICK_BELOW
19.157 0.069 30.696 0.00 -3.38 IN_REP_TO
18.149 43.982 2.532 0.95 1.67 CTYPE_JUST_HTML
14.905 36.623 1.777 0.95 2.06 BIG_FONT
8.661 22.797 0.115 0.99 2.20 REMOVE_PAGE
8.446 22.170 0.149 0.99 1.00 MISSING_MIMEOLE
11.575 28.642 1.259 0.96 1.00 HTML_FONT_COLOR_RED
12.260 29.911 1.590 0.95 0.85 CLICK_HERE_LINK
10.509 0.088 16.808 0.01 -0.10 REFERENCES
5.841 15.502 0.001 1.00 4.34 MIME_ODD_CASE
7.585 19.022 0.672 0.97 1.50 SPAM_PHRASE_13_21
12.069 27.982 2.449 0.92 0.38 FROM_ENDS_IN_NUMS
5.775 15.129 0.120 0.99 2.54 SUBJ_HAS_UNIQ_ID
8.522 20.842 1.074 0.95 3.31 NORMAL_HTTP_TO_IP
6.212 15.992 0.300 0.98 1.08 EXCUSE_3
9.552 22.876 1.498 0.94 1.00 HTML_FONT_COLOR_BLUE
5.246 13.853 0.043 1.00 4.04 SUBJ_HAS_SPACES
6.001 15.480 0.270 0.98 1.00 CLICK_BELOW_CAPS
4.853 12.876 0.002 1.00 4.39 INVALID_DATE_TZ_ABSURD
9.922 0.473 15.634 0.03 -1.00 QUOTED_EMAIL_TEXT
4.734 12.410 0.095 0.99 2.00 SPAM_PHRASE_21_34
8.029 19.068 1.357 0.93 1.00 HTML_50_70
5.669 14.009 0.627 0.96 1.00 INVALID_DATE
4.238 10.944 0.184 0.98 2.38 DATE_IN_FUTURE_06_12
9.320 21.057 2.224 0.90 0.50 SPAM_PHRASE_08_13
4.456 11.360 0.283 0.98 1.00 MISSING_OUTLOOK_NAME
3.470 9.167 0.026 1.00 3.33 TRACKER_ID
3.412 9.028 0.017 1.00 4.30 SUBJ_FULL_OF_8BITS
5.104 12.606 0.569 0.96 2.58 FROM_HAS_MIXED_NUMS
3.161 8.389 0.001 1.00 1.00 FROM_HAS_MIXED_NUMS2
5.748 0.136 9.141 0.01 -1.00 USER_AGENT
6.368 0.289 10.043 0.03 -2.50 KNOWN_MAILING_LIST
3.010 7.976 0.008 1.00 1.00 CLICK_HERE_CAPS_LINK
3.435 8.897 0.133 0.99 1.00 MORTGAGE_OBFU
2.872 7.619 0.002 1.00 1.14 DATE_YEAR_ZERO_FIRST
3.520 8.999 0.208 0.98 2.06 BASE64_ENC_TEXT
4.292 10.606 0.476 0.96 1.00 HTML_FONT_COLOR_UNSAFE
9.535 20.586 2.854 0.88 0.01 LINES_OF_YELLING
21.573 40.735 9.989 0.80 0.50 NO_REAL_NAME
3.564 8.977 0.292 0.97 0.70 MIME_LONG_LINE_QP
2.704 7.134 0.027 1.00 1.00 TABLE_THICK_BORDER
2.639 6.983 0.013 1.00 2.00 HEADER_8BITS
2.919 7.588 0.096 0.99 1.88 NO_OBLIGATION
3.066 7.902 0.142 0.98 1.00 HTML_FONT_COLOR_GREEN
2.528 6.708 0.001 1.00 4.00 FORGED_AOL_RCVD
2.681 7.032 0.052 0.99 1.00 SUBJ_FREE_CAP
2.718 7.102 0.068 0.99 3.89 MIME_MISSING_BOUNDARY
3.430 8.569 0.324 0.96 1.00 OFFER
4.086 0.043 6.530 0.01 -1.00 EMAIL_ATTRIBUTION
4.126 9.932 0.616 0.94 1.00 HTML_FONT_INVISIBLE
2.742 7.064 0.129 0.98 3.16 WORK_AT_HOME
9.104 19.110 3.055 0.86 0.78 MAILTO_LINK
3.847 9.258 0.575 0.94 1.94 UPPERCASE_25_50
3.165 7.855 0.330 0.96 1.13 FOR_FREE
3.708 0.036 5.928 0.01 -1.00 X_MAILING_LIST
2.161 5.713 0.014 1.00 1.00 MARKETING_PARTNERS
3.100 7.705 0.316 0.96 0.40 EXCUSE_14
2.570 6.586 0.142 0.98 1.00 PRIORITY_NO_NAME
2.911 7.304 0.255 0.97 1.00 HTML_FONT_COLOR_NAME
4.683 10.787 0.994 0.92 1.35 HTML_WITH_BGCOLOR
6.183 13.563 1.721 0.89 0.10 WEB_BUGS
2.097 5.497 0.042 0.99 3.42 COMPLETELY_FREE
2.167 5.646 0.063 0.99 4.43 FRONTPAGE
4.323 10.023 0.878 0.92 0.25 SUBJ_ALL_CAPS
2.232 5.774 0.091 0.98 1.00 REMOVE_SUBJ
4.100 9.577 0.789 0.92 1.45 FORGED_YAHOO_RCVD
2.794 6.888 0.319 0.96 1.00 HTML_FONT_COLOR_YELLOW
5.300 11.683 1.442 0.89 1.00 HTML_FONT_FACE_ODD
1.806 4.750 0.027 0.99 3.31 DATE_IN_FUTURE_12_24
4.149 9.458 0.939 0.91 1.00 HTML_FONT_COLOR_GRAY
4.317 9.774 1.019 0.91 0.07 LINES_OF_YELLING_2
3.485 8.156 0.661 0.93 1.90 MAILTO_WITH_SUBJ
1.806 4.716 0.046 0.99 3.12 CHARSET_FARAWAY_HEADERS
2.298 5.758 0.206 0.97 2.08 OPT_IN
2.903 0.039 4.635 0.01 -1.00 USER_AGENT_PINE
1.818 4.723 0.062 0.99 1.00 BAD_CREDIT
2.833 0.041 4.521 0.01 -2.00 SIGNATURE_SHORT_DENSE
1.725 4.489 0.054 0.99 4.24 NO_EXPERIENCE
4.790 10.505 1.335 0.89 1.60 MSG_ID_ADDED_BY_MTA_2
2.610 6.316 0.369 0.94 1.00 ONLY_COST
1.633 4.252 0.049 0.99 0.32 SUBJ_REMOVE
2.068 5.165 0.196 0.96 1.00 SAVE_MONEY
1.664 4.307 0.066 0.98 1.00 ACT_NOW
1.719 4.419 0.087 0.98 2.60 REMOVE_IN_QUOTES
1.418 3.756 0.004 1.00 1.00 CONSOLIDATE_DEBT
2.019 5.016 0.207 0.96 0.26 EMAIL_MARKETING
2.377 0.029 3.797 0.01 -1.00 GROUPS_YAHOO_1
2.983 6.876 0.630 0.92 0.37 MAILTO_TO_REMOVE
2.151 5.150 0.337 0.94 2.33 EXCUSE_1
2.451 0.110 3.866 0.03 -1.00 X_LOOP
1.757 4.341 0.196 0.96 1.00 HTML_FONT_COLOR_CYAN
1.881 0.000 3.018 0.00 -1.00 USER_AGENT_MUTT
1.382 3.545 0.074 0.98 2.29 US_DOLLARS_3
1.476 3.737 0.109 0.97 1.00 MANY_EXCLAMATIONS
1.124 2.978 0.003 1.00 1.00 REFINANCE
1.143 3.018 0.009 1.00 4.35 MORTGAGE_RATES
1.467 3.705 0.114 0.97 1.00 FORGED_RCVD_TRAIL
1.865 0.011 2.986 0.00 -3.00 FROM_EGROUPS
1.094 2.899 0.002 1.00 1.00 EXCUSE_FUTURE
1.303 3.348 0.066 0.98 1.00 HTML_FONT_COLOR_MAGENTA
1.220 3.157 0.050 0.98 1.00 DIET
3.486 7.551 1.029 0.88 0.50 FORGED_HOTMAIL_RCVD
1.017 2.698 0.002 1.00 4.18 FREE_CONSULTATION
1.012 2.685 0.000 1.00 4.55 X_PRECEDENCE_REF
1.675 0.009 2.683 0.00 -1.00 USER_AGENT_KMAIL
1.019 2.687 0.010 1.00 4.92 MAILTO_WITH_SUBJ_REMOVE
1.004 2.655 0.007 1.00 1.00 FREE_PORN
1.965 4.635 0.351 0.93 1.64 DATE_IN_PAST_06_12
1.019 2.683 0.014 0.99 1.00 BEST_PORN
0.968 2.568 0.001 1.00 0.20 JUST_MAILED_PAGE
0.961 2.552 0.000 1.00 4.14 X_ENC_PRESENT
2.429 5.511 0.566 0.91 0.48 LINES_OF_YELLING_3
1.235 3.129 0.090 0.97 0.36 EXCUSE_10
1.459 3.583 0.174 0.95 1.94 NO_COST
1.120 2.854 0.071 0.98 4.30 LIMITED_TIME_ONLY
1.519 0.011 2.430 0.00 -0.50 SIGNATURE_LONG_SPARSE
1.049 2.694 0.055 0.98 1.00 SUB_FREE_OFFER
1.430 3.472 0.196 0.95 1.39 PORN_4
1.391 3.380 0.189 0.95 1.00 HTML_70_90
1.422 0.014 2.274 0.01 -3.13 PGP_SIGNATURE
0.826 2.187 0.004 1.00 1.00 THE_BEST_RATE
0.874 2.291 0.018 0.99 2.23 AS_SEEN_ON
0.925 2.391 0.039 0.98 2.02 UNSUB_SCRIPT
0.891 2.314 0.030 0.99 1.00 ADULT_SITE
0.862 2.241 0.028 0.99 1.00 DO_IT_TODAY
1.501 3.553 0.261 0.93 0.50 UNDISC_RECIPS
0.735 1.951 0.000 1.00 3.40 KOREAN_UCE_SUBJECT
1.729 3.949 0.386 0.91 1.70 DATE_IN_PAST_12_24
0.796 2.068 0.027 0.99 0.48 EXCUSE_15
0.708 1.875 0.002 1.00 1.00 LOW_PAYMENT
0.721 1.898 0.009 1.00 1.00 X_LIBRARY
1.157 2.801 0.164 0.94 2.43 MIME_EXCESSIVE_QP
0.893 2.257 0.069 0.97 3.00 TO_HAS_SPACES
0.697 1.833 0.010 0.99 1.00 SAVE_THOUSANDS
0.642 1.699 0.003 1.00 1.18 SENT_IN_COMPLIANCE
1.193 2.840 0.197 0.94 0.21 SUSPICIOUS_RECIPS
0.621 1.647 0.000 1.00 1.00 FREE_INVESTMENT
0.633 1.665 0.009 0.99 1.00 LOW_INTEREST
0.592 1.570 0.001 1.00 1.00 FREE_PASSWORD
0.825 2.054 0.083 0.96 0.50 MIME_HTML_NO_CHARSET
0.715 1.820 0.047 0.97 4.21 REMOVAL_INSTRUCTIONS
0.595 1.560 0.011 0.99 4.73 CALL_NOW
0.552 1.465 0.000 1.00 1.00 COMPETE
2.273 0.383 3.416 0.10 1.00 T_NOSPAM_INC
0.662 1.701 0.034 0.98 1.50 GUARANTEE
0.583 1.528 0.011 0.99 2.16 CHARSET_FARAWAY
0.737 1.856 0.061 0.97 0.88 BULK_EMAIL
4.485 8.425 2.104 0.80 0.21 MSG_ID_ADDED_BY_MTA_3
0.549 1.447 0.007 1.00 1.00 INITIAL_INVEST
0.886 0.003 1.419 0.00 -1.50 SIGNATURE_SHORT_SPARSE
0.574 1.496 0.017 0.99 1.50 MONEY_BACK
2.532 0.508 3.756 0.12 -1.00 X_ACCEPT_LANG
0.571 1.488 0.017 0.99 3.46 FAKED_UNDISC_RECIPS
0.771 1.909 0.083 0.96 2.88 HOME_EMPLOYMENT
0.519 1.373 0.003 1.00 1.00 PAY_SITE
1.743 3.765 0.521 0.88 0.68 DATE_IN_PAST_03_06
2.557 5.195 0.963 0.84 2.59 UNSUB_PAGE
0.514 1.355 0.005 1.00 0.59 GUARANTEED_100_PERCENT
0.865 0.009 1.383 0.01 -1.00 USER_AGENT_MOZILLA_UA
0.563 1.458 0.021 0.99 1.00 MLM
0.617 1.573 0.039 0.98 1.00 EXTRA_MPART_TYPE
0.507 1.334 0.007 0.99 2.82 SECTION_301
0.889 2.127 0.140 0.94 1.85 INVALID_MSGID
0.979 2.306 0.177 0.93 1.79 ORDER_NOW
0.480 1.272 0.002 1.00 1.00 HGH
1.629 3.521 0.485 0.88 0.90 X_PRIORITY_HIGH
0.631 1.594 0.050 0.97 1.00 SAVE_UP_TO
0.505 1.324 0.010 0.99 1.00 EXTRA_CASH
0.468 1.241 0.001 1.00 2.50 SPAM_PHRASE_34_55
0.559 1.437 0.029 0.98 1.07 OPPORTUNITY
0.537 1.380 0.028 0.98 1.00 CARRIAGE_RETURNS
2.940 5.728 1.255 0.82 0.52 MAILTO_TO_SPAM_ADDR
0.496 1.287 0.018 0.99 1.00 ADVERT_CODE2
0.775 1.863 0.116 0.94 1.30 EXCUSE_7
0.434 1.149 0.001 1.00 1.00 BEEN_TURNED_DOWN
1.044 2.372 0.242 0.91 0.69 JAVASCRIPT_UNSAFE
0.408 1.084 0.000 1.00 2.67 TAKE_ACTION_NOW
0.405 1.073 0.001 1.00 1.00 FREE_GRANT
0.400 1.061 0.000 1.00 1.00 REFINANCE_YOUR_HOME
0.399 1.060 0.000 1.00 1.00 EASY_TERMS
0.395 1.048 0.000 1.00 1.00 REVERSE_AGING
0.901 2.063 0.198 0.91 3.42 DATE_IN_FUTURE_03_06
0.408 1.067 0.010 0.99 1.00 HAIR_LOSS
0.372 0.986 0.001 1.00 1.00 HIDE_WIN_STATUS
0.866 1.989 0.186 0.91 1.00 HOT_NASTY
0.409 1.063 0.015 0.99 1.28 PENIS_ENLARGE
0.413 1.069 0.017 0.98 2.22 HTTP_ESCAPED_HOST
0.356 0.942 0.002 1.00 1.00 FINANCIAL
0.358 0.946 0.003 1.00 4.61 BILL_1618
0.614 1.480 0.090 0.94 2.00 RATWARE_EGROUPS
0.368 0.966 0.006 0.99 4.72 ADVERT_CODE
0.450 1.142 0.032 0.97 0.50 SUB_HELLO
0.568 1.382 0.076 0.95 3.09 DEAR_FRIEND
0.379 0.987 0.011 0.99 1.00 VACATION_SCAM
0.413 1.057 0.024 0.98 2.29 ALL_NATURAL
0.356 0.933 0.007 0.99 3.66 DATE_IN_FUTURE_24_48
0.364 0.948 0.010 0.99 3.37 BUGGY_CGI
1.692 3.425 0.644 0.84 1.00 HTML_FONT_FACE_BAD
0.402 1.030 0.023 0.98 0.85 THIS_AINT_SPAM
0.329 0.871 0.002 1.00 0.39 CLICK_TO_REMOVE_2
0.377 0.972 0.017 0.98 2.46 ONE_TIME_MAILING
2.043 3.998 0.861 0.82 0.69 CALL_FREE
0.314 0.834 0.000 1.00 2.19 X_MAIL_ID_PRESENT
0.614 1.454 0.106 0.93 1.00 NO_FEE
0.348 0.907 0.010 0.99 2.00 SUBJ_ENDS_IN_SPACE
0.347 0.904 0.010 0.99 2.00 RATWARE_HASH_2
0.343 0.895 0.010 0.99 1.00 SAVINGS
0.324 0.852 0.004 1.00 4.58 FORGED_EUDORAMAIL_RCVD
0.332 0.869 0.008 0.99 3.24 FORM_W_MAILTO_ACTION
0.319 0.839 0.004 1.00 1.00 MUST_BE_18
0.847 1.893 0.215 0.90 1.00 HTML_FONT_COLOR_NOHASH
0.317 0.833 0.005 0.99 1.00 CREDIT_CARD
0.395 0.998 0.031 0.97 4.05 STRONG_BUY
0.291 0.773 0.000 1.00 3.76 X_SERV_HOST_PRESENT
0.294 0.779 0.001 1.00 3.41 COPY_ACCURATELY
0.552 0.015 0.876 0.02 -1.00 RESENT_TO
0.728 1.655 0.167 0.91 1.00 T_LOW_PRICE
0.477 0.000 0.766 0.00 -4.00 DEBIAN_BTS_BUG
0.488 1.183 0.067 0.95 1.00 DATE_MISSING
0.289 0.765 0.002 1.00 4.34 CBYI
0.283 0.752 0.000 1.00 1.51 X_X_PRESENT
2.418 4.528 1.143 0.80 0.32 EXCUSE_16
1.526 3.069 0.593 0.84 2.65 HTTP_WITH_EMAIL_IN_URL
1.614 3.208 0.651 0.83 0.26 WEIRD_PORT
0.504 1.207 0.079 0.94 1.00 ALL_CAP_PORN
0.349 0.889 0.023 0.97 3.34 JAVASCRIPT_VERY_UNSAFE
0.272 0.723 0.000 1.00 3.19 X_LIST_UNSUBSCRIBE
0.269 0.714 0.001 1.00 3.52 RESISTANCE_IS_FUTILE
0.588 1.367 0.117 0.92 1.00 MEMBER_2
0.267 0.708 0.000 1.00 0.50 USERNAME_IN_SUBJECT_6
0.267 0.708 0.000 1.00 2.00 USERNAME_IN_SUBJECT_3
0.267 0.708 0.000 1.00 1.00 USERNAME_IN_SUBJECT_5
0.267 0.708 0.000 1.00 1.00 USERNAME_IN_SUBJECT
0.267 0.708 0.000 1.00 1.50 USERNAME_IN_SUBJECT_4
0.267 0.708 0.000 1.00 0.50 USERNAME_IN_SUBJECT_2
0.580 1.351 0.114 0.92 1.92 FOR_JUST_SOME_AMT
6.322 10.139 4.015 0.72 0.10 SPAM_PHRASE_05_08
0.416 1.023 0.049 0.95 1.00 WHY_WAIT
0.282 0.740 0.006 0.99 1.00 BE_BOSS
0.322 0.825 0.018 0.98 4.47 US_DOLLARS
0.254 0.675 0.000 1.00 1.00 HIDDEN_ASSETS
0.258 0.683 0.002 1.00 2.19 PARA_A_2_C_OF_1618
0.302 0.778 0.015 0.98 4.39 FROM_STARTS_WITH_NUMS
0.263 0.690 0.005 0.99 1.00 WHILE_YOU_SLEEP
0.310 0.790 0.019 0.98 0.73 REPLY_REMOVE_SUBJECT
0.259 0.679 0.005 0.99 0.63 AOL_USERS_LINK
0.632 1.428 0.150 0.90 1.40 TONER
0.650 1.461 0.159 0.90 1.51 HTTP_USERNAME_USED
0.409 0.990 0.057 0.95 1.00 SATISFACTION
0.234 0.622 0.000 1.00 1.00 SUBJ_GUARANTEED
0.646 1.448 0.162 0.90 2.39 CASHCASHCASH
0.231 0.613 0.000 1.00 1.00 LOSE_POUNDS
0.591 1.337 0.139 0.91 0.94 DATE_IN_PAST_96_XX
1.057 2.175 0.382 0.85 0.50 FROM_AND_TO_SAME_1
0.276 0.707 0.016 0.98 2.50 MONEY_MAKING
0.227 0.601 0.002 1.00 1.00 VIAGRA_ONLINE
0.320 0.796 0.032 0.96 1.00 HTML_FONT_COLOR_UNKNOWN
0.221 0.586 0.001 1.00 4.28 WE_HONOR_ALL
0.259 0.668 0.013 0.98 1.00 FREE_ACCESS
2.678 4.740 1.432 0.77 1.00 MAY_BE_FORGED
0.217 0.575 0.001 1.00 4.80 MAIL_IN_ORDER_FORM
0.339 0.833 0.041 0.95 1.00 CONGRATULATIONS
0.239 0.622 0.008 0.99 4.56 PRINT_FORM_SIGNATURE
0.282 0.712 0.023 0.97 3.29 DIRECT_EMAIL
0.258 0.659 0.015 0.98 2.00 OBFUSCATING_COMMENT
1.039 2.112 0.391 0.84 0.76 PLING_PLING
0.210 0.555 0.002 1.00 1.00 FREE_MEMBERSHIP
0.228 0.593 0.007 0.99 0.15 EXCUSE_13
0.616 1.358 0.168 0.89 1.00 T_SALE
0.218 0.570 0.006 0.99 4.29 WANTS_CREDIT_CARD
0.246 0.626 0.017 0.97 1.00 ITS_LEGAL
0.248 0.629 0.017 0.97 4.67 VIAGRA
0.191 0.506 0.000 1.00 1.00 WRINKLES
0.217 0.563 0.008 0.99 1.00 DOMAIN_4U2
2.580 4.502 1.418 0.76 1.52 JAVASCRIPT
0.188 0.500 0.000 1.00 1.00 DIG_UP_INFO
0.271 0.676 0.027 0.96 4.32 PENIS_ENLARGE2
0.603 1.315 0.172 0.88 1.00 SAVE_BUCKS
0.551 1.214 0.150 0.89 0.45 GREAT_OFFER
0.208 0.539 0.008 0.98 1.00 ONE_TIME
0.262 0.652 0.026 0.96 1.00 WHILE_SUPPLIES
1.153 2.248 0.492 0.82 1.00 TO_ADDRESS_EQ_REAL
0.179 0.472 0.002 1.00 1.00 UNSECURED_CREDIT
1.661 3.048 0.823 0.79 1.93 RISK_FREE
0.265 0.650 0.031 0.95 1.00 INSTANT_ACCESS
1.430 2.680 0.675 0.80 0.50 ASCII_FORM_ENTRY
0.228 0.574 0.020 0.97 4.41 YOUR_INCOME
0.279 0.676 0.038 0.95 3.00 FOR_INSTANT_ACCESS
0.176 0.461 0.004 0.99 1.60 FROM_AND_TO_SAME_3
0.204 0.520 0.013 0.98 1.00 HTML_WIN_OPEN
0.163 0.433 0.001 1.00 3.50 PREST_NON_ACCREDITED
0.267 0.000 0.428 0.00 -3.13 PGP_SIGNATURE_2
0.179 0.467 0.005 0.99 0.66 X_ESMTP
0.176 0.460 0.005 0.99 4.53 TO_NO_USER
0.392 0.893 0.089 0.91 1.00 MONTH_TRIAL
0.533 1.153 0.158 0.88 1.02 VERY_SUSP_RECIPS
0.158 0.418 0.001 1.00 1.00 ONLINE_PHARMACY
0.153 0.406 0.000 1.00 1.00 COMPLAIN_TO
0.274 0.659 0.041 0.94 2.76 AMAZING
0.429 0.043 0.662 0.06 -1.00 FWD_MSG_2
0.262 0.631 0.039 0.94 4.34 EARN_PER_WEEK
0.144 0.383 0.000 1.00 2.69 HR_3113
0.277 0.659 0.046 0.93 0.42 HTML_EMBEDS
0.277 0.657 0.046 0.93 1.00 OPPORTUNITY_2
0.149 0.392 0.002 1.00 1.00 CUM_SHOT
0.319 0.739 0.065 0.92 0.25 ASKS_BILLING_ADDRESS
0.160 0.416 0.005 0.99 1.00 SUBJ_DOLLARS
0.172 0.441 0.009 0.98 3.70 COPY_DVDS
0.289 0.679 0.054 0.93 2.15 MSGID_HAS_NO_AT
0.143 0.377 0.001 1.00 2.90 EXCUSE_4
0.174 0.443 0.011 0.98 1.75 NUMERIC_HTTP_ADDR
0.139 0.368 0.001 1.00 1.00 FREE_INSTALL
0.234 0.565 0.033 0.94 4.20 BE_AMAZED
0.586 1.220 0.203 0.86 2.61 DATE_IN_PAST_24_48
0.137 0.361 0.001 1.00 1.00 WHY_PAY_MORE
0.133 0.352 0.000 1.00 4.25 NEW_DOMAIN_EXTENSIONS
0.132 0.351 0.000 1.00 2.62 UCE_MAIL_ACT
0.130 0.346 0.000 1.00 1.00 MIME_BOUND_OPTIN
0.169 0.428 0.012 0.97 1.00 LIVE_PORN
0.129 0.342 0.000 1.00 2.00 RATWARE_MBOMBER
0.351 0.788 0.088 0.90 2.97 UPPERCASE_50_75
0.359 0.800 0.092 0.90 1.09 DEAR_SOMETHING
0.130 0.343 0.001 1.00 2.61 JODY
0.141 0.366 0.004 0.99 2.00 RATWARE_JPFREE
0.206 0.000 0.331 0.00 -1.00 SIGNATURE_LONG_DENSE
0.503 1.060 0.167 0.86 1.45 TO_EMPTY
3.837 1.998 4.948 0.29 -0.70 X_AUTH_WARNING
1.167 2.153 0.571 0.79 1.07 MISSING_HEADERS
0.169 0.422 0.015 0.97 1.99 THE_FOLLOWING_FORM
0.119 0.316 0.000 1.00 3.80 WE_HATE_SPAM
0.330 0.738 0.083 0.90 1.00 OFFER_EXPIRE
0.197 0.001 0.316 0.00 -1.00 DISCLAIMER_LEGALESE
0.116 0.308 0.000 1.00 1.00 TO_RECIP_MARKER
0.123 0.322 0.002 0.99 4.20 SUBJ_2_CREDIT
0.180 0.441 0.021 0.95 1.00 NO_PURCHASE
0.113 0.299 0.000 1.00 1.00 MEGA_SITE
0.114 0.302 0.001 1.00 1.00 WE_PROMISE_YOU
0.112 0.296 0.000 1.00 4.52 NONEXISTENT_CHARSET
0.118 0.311 0.002 0.99 4.23 NIGERIAN_SCAM_7
0.161 0.401 0.016 0.96 1.00 ACCEPT_CREDIT_CARDS
0.168 0.415 0.019 0.96 0.95 MASS_EMAIL
0.107 0.284 0.000 1.00 1.00 VIAGRA_COMBO
0.116 0.301 0.004 0.99 1.00 PRIZE
0.157 0.387 0.017 0.96 2.33 FULL_REFUND
0.103 0.272 0.000 1.00 1.00 NO_CREDIT_CHECK
0.102 0.270 0.000 1.00 4.20 GENTLE_FEROCITY
0.101 0.269 0.000 1.00 1.00 MIME_BOUND_HASHES
0.100 0.266 0.000 1.00 4.28 VJESTIKA
0.179 0.003 0.286 0.01 -1.00 HOTMAIL_FOOTER2
0.119 0.308 0.006 0.98 1.00 GUARANTEED_STUFF
0.249 0.566 0.057 0.91 1.00 YOU_WON
0.097 0.258 0.000 1.00 1.00 MARKUP_RAND
0.279 0.621 0.072 0.90 1.00 GIVING_AWAY
0.094 0.251 0.000 1.00 4.00 RCVD_FAKE_HELO_DOTCOM
0.112 0.288 0.005 0.98 1.00 FIND_ANYTHING
0.149 0.366 0.018 0.95 2.00 RATWARE_GROUPMAIL
0.094 0.248 0.001 1.00 1.00 NOT_MLM
0.587 0.145 0.854 0.15 1.00 SUBJECT_IS_LIST
0.093 0.246 0.001 1.00 1.00 SAVE_ON_INSURANCE
0.133 0.330 0.014 0.96 1.00 UNLIMITED
0.131 0.327 0.014 0.96 1.00 FANTASTIC
0.090 0.238 0.001 1.00 1.00 NASTY_GIRLS
0.088 0.232 0.001 1.00 3.84 KIFF
0.102 0.263 0.005 0.98 2.89 DATE_IN_FUTURE_48_96
0.275 0.602 0.078 0.89 1.00 LARGE_COLLECTION
0.864 1.573 0.436 0.78 2.10 SMTPD_IN_RCVD
0.084 0.222 0.000 1.00 1.00 APPLY_ONLINE
0.089 0.234 0.002 0.99 1.00 COMPARE_RATES
0.090 0.234 0.003 0.99 1.00 MIME_BOUND_DIGITS_4
0.398 0.812 0.147 0.85 1.00 PLING_QUERY
0.096 0.246 0.005 0.98 3.03 LOTS_OF_CC_LINES
0.173 0.403 0.034 0.92 2.99 UPPERCASE_75_100
0.094 0.242 0.004 0.98 1.00 LINK_TO_NO_SCHEME
0.079 0.211 0.000 1.00 3.46 MSGID_SPAMSIGN_1
0.128 0.000 0.205 0.00 -5.00 PATCH_UNIFIED_DIFF
0.074 0.197 0.000 1.00 2.00 RATWARE_VC_IPA
0.187 0.426 0.043 0.91 2.49 DOMAIN_BODY
0.072 0.191 0.000 1.00 1.00 INCOME
0.200 0.447 0.051 0.90 1.00 NUMERIC_COMMENT
0.186 0.421 0.044 0.90 1.00 EXPECT_TO_EARN
0.071 0.187 0.000 1.00 1.00 PORN_MEMBERSHIP
0.070 0.187 0.000 1.00 3.52 FROM_BTAMAIL
0.086 0.220 0.005 0.98 1.00 FREE_CELL_PHONE
0.116 0.000 0.186 0.00 -6.00 EGP_HTML_BANNER
0.070 0.186 0.000 1.00 1.00 UNCLAIMED_MONEY
0.128 0.306 0.020 0.94 3.97 MSG_ID_ADDED_BY_MTA
1.100 1.874 0.632 0.75 0.18 US_DOLLARS_2
0.074 0.194 0.002 0.99 1.14 NOT_INTENDED
3.463 2.090 4.294 0.33 -2.00 EXCHANGE_SERVER
0.067 0.179 0.000 1.00 2.27 X_STORMPOST_TO
0.072 0.188 0.002 0.99 1.00 USER_4U2
0.075 0.194 0.003 0.99 1.00 MIME_BOUND_HEX_24
0.065 0.173 0.000 1.00 1.80 EXCUSE_5
0.106 0.259 0.014 0.95 1.00 LONG_DISTANCE
4.292 5.927 3.303 0.64 0.49 DEAR_SOMEBODY
0.108 0.263 0.015 0.95 1.00 HTML_COMMENT_SAVED_URL
0.100 0.245 0.011 0.96 1.00 FREE_SAMPLE
0.069 0.180 0.002 0.99 2.57 IMPOTENCE
0.184 0.408 0.048 0.89 0.77 SEE_FOR_YOURSELF
1.446 0.680 1.909 0.26 -1.00 USER_AGENT_MOZILLA_XM
0.068 0.179 0.002 0.99 1.00 EARNINGS
0.107 0.259 0.015 0.94 1.00 WINNER
0.271 0.049 0.405 0.11 -1.00 APPROVED_BY
0.063 0.166 0.001 1.00 1.00 MIME_BOUND_DIGITS_7
0.152 0.346 0.034 0.91 2.18 EXCUSE_6
0.064 0.168 0.001 0.99 2.00 RATWARE_GR
0.091 0.225 0.010 0.96 2.90 EXCUSE_12
0.059 0.156 0.000 1.00 3.00 RATWARE_OE_PI
0.134 0.309 0.028 0.92 3.42 NIGERIAN_SCAM_4
0.067 0.173 0.003 0.99 1.47 FROM_NO_USER
0.799 1.388 0.442 0.76 0.03 RELAYING_FRAME
0.056 0.148 0.000 1.00 2.00 RATWARE_STORM
0.124 0.007 0.195 0.03 -1.00 USER_AGENT_ENTOURAGE
0.055 0.147 0.000 1.00 2.00 RATWARE_JIXING
1.242 0.584 1.640 0.26 -0.50 SUBJECT_MONTH
0.054 0.143 0.000 1.00 2.00 RATWARE_SCREWUP_1
0.057 0.150 0.001 0.99 2.00 READ_TO_END
0.102 0.242 0.017 0.94 1.00 NIGERIAN_SCAM_14
0.087 0.000 0.139 0.00 -2.00 BUGZILLA_BUG
0.098 0.234 0.016 0.94 1.00 RICH
0.101 0.003 0.160 0.02 -2.40 YAHOO_MSGID_ADDED
0.052 0.138 0.001 1.00 1.00 MICROSOFT
0.082 0.000 0.132 0.00 -10.00 GENUINE_EBAY_RCVD
0.054 0.141 0.002 0.99 1.00 BREAKTHROUGH_2
0.107 0.249 0.021 0.92 1.00 MIME_BOUND_DIGITS_3
0.048 0.128 0.000 1.00 1.00 HARDCORE_PORN
0.413 0.771 0.196 0.80 0.50 FROM_AND_TO_SAME_5
0.048 0.126 0.000 1.00 2.00 RATWARE_CURMAIL
0.048 0.126 0.000 1.00 1.00 HR_4176
1.633 2.477 1.122 0.69 0.39 GAPPY_TEXT
3.030 4.202 2.322 0.64 1.00 FREE_MONEY
0.047 0.124 0.000 1.00 1.00 CONFIDENTIAL_ORDER
0.052 0.135 0.002 0.99 1.00 CELEBRITY_PORN
0.572 1.012 0.305 0.77 0.29 X_MSMAIL_PRIORITY_HIGH
0.058 0.148 0.004 0.98 1.00 BANKRUPTCY
0.197 0.409 0.069 0.85 0.97 GAPPY_SUBJECT
0.065 0.161 0.007 0.96 1.00 NO_COMBINE
1.727 2.568 1.218 0.68 1.00 FREE_QUOTE
0.050 0.130 0.002 0.98 1.00 NIGERIAN_TRANSACTION_2
0.078 0.187 0.012 0.94 1.00 MIME_BOUND_DIGITS_2
0.172 0.362 0.056 0.87 1.00 SIGN_UP
0.048 0.125 0.002 0.99 0.10 MICROSOFT_EXECUTABLE
0.048 0.124 0.002 0.99 1.00 BUY_DIRECT
0.049 0.127 0.002 0.98 1.00 AMATEUR_PORN
0.042 0.111 0.000 1.00 1.00 CABLE_CONVERTER
0.114 0.257 0.028 0.90 2.41 NO_QS_ASKED
0.043 0.113 0.001 1.00 1.00 HERBAL_VIAGRA
0.041 0.109 0.000 1.00 1.00 MIME_BOUND_MA
0.041 0.109 0.000 1.00 2.00 RATWARE_MMAILER
0.045 0.117 0.001 0.99 1.00 RATWARE_OE_MALFORMED
0.040 0.107 0.000 1.00 3.65 FAKED_IP_IN_RCVD
0.056 0.141 0.005 0.96 1.00 NAME_BRAND
0.098 0.223 0.022 0.91 3.46 DATE_IN_FUTURE_96_XX
0.081 0.190 0.015 0.93 1.00 GETAWAY
0.128 0.278 0.037 0.88 1.00 FREE_DVD
0.040 0.106 0.001 1.00 1.00 FORWARD_LOOKING
0.164 0.342 0.056 0.86 1.00 RESERVES_RIGHT
0.038 0.102 0.000 1.00 2.00 RATWARE_EVAMAIL
0.158 0.332 0.053 0.86 1.00 T_GET_PAID
0.158 0.332 0.053 0.86 1.00 GET_PAID
0.038 0.101 0.000 1.00 4.46 ADDRESSES_ON_CD
0.101 0.226 0.025 0.90 1.00 SPAM_PHRASES_020
0.041 0.107 0.001 0.99 2.00 RATWARE_SCREWUP_2
0.391 0.709 0.198 0.78 1.40 FROM_AND_TO_SAME_6
0.113 0.248 0.031 0.89 4.27 NIGERIAN_SCAM_2
0.054 0.134 0.006 0.96 2.60 REPLY_TO_EMPTY
0.056 0.138 0.007 0.95 3.06 BILLION_DOLLARS
0.034 0.092 0.000 1.00 2.00 RATWARE_YAM
0.038 0.098 0.001 0.99 3.08 HTTP_CTRL_CHARS_HOST
0.036 0.094 0.001 0.99 2.96 STOCK_ALERT
0.037 0.098 0.001 0.99 1.00 NO_INVENTORY
0.050 0.124 0.005 0.96 1.00 RAPE
0.387 0.690 0.203 0.77 1.00 FREQ_SPAM_PHRASE
0.138 0.289 0.046 0.86 2.33 NIGERIAN_SCAM_5
0.032 0.086 0.000 1.00 2.00 RATWARE_PASCUAL
0.033 0.087 0.001 0.99 1.00 MEET_SINGLES
0.031 0.083 0.000 1.00 1.00 CREDITORS_CALLING
0.033 0.086 0.001 0.99 1.00 HIDDEN_CHARGES
0.082 0.185 0.020 0.90 1.00 TARGETED
0.031 0.081 0.000 1.00 4.46 PRODUCED_AND_SENT_OUT
0.244 0.462 0.113 0.80 1.62 ALL_CAPS_HEADER
0.048 0.000 0.076 0.00 -1.00 USER_AGENT_IMP
1.424 0.856 1.767 0.33 -0.50 SUBJECT_MONTH_2
0.030 0.079 0.001 0.99 1.00 STOP_SNORING
0.039 0.098 0.004 0.96 1.00 FRIEND_AT_PUBLIC
0.029 0.077 0.001 0.99 3.09 X_PMFLAGS_PRESENT
0.114 0.238 0.038 0.86 1.00 T_MEMBER
0.027 0.072 0.000 1.00 2.56 MDAEMON_2_7_4
0.041 0.101 0.005 0.96 1.00 NO_STRINGS
0.026 0.070 0.000 1.00 2.00 RATWARE_IMKTG
0.041 0.100 0.005 0.96 1.00 PROMOTION
0.111 0.232 0.038 0.86 1.81 PROFITS
0.648 1.031 0.416 0.71 1.00 FREE_TRIAL
0.026 0.068 0.000 1.00 4.85 FORGED_GW05_RCVD
0.025 0.067 0.000 1.00 4.00 VAR_REF_IN_RECEIVED
0.025 0.067 0.000 1.00 2.00 RATWARE_XMAILER
0.025 0.067 0.000 1.00 1.00 PORN_PASSWORD
0.026 0.068 0.001 0.99 1.00 X_LIST_HOST
0.024 0.063 0.000 1.00 2.94 PENNIES_A_DAY
0.584 0.932 0.373 0.71 1.00 HTML_FONT_FACE_CAPS
0.256 0.463 0.132 0.78 1.00 SOCIAL_SEC_NUMBER
0.035 0.086 0.004 0.96 3.78 STOCK_PICK
0.025 0.066 0.001 0.99 1.00 HTML_COMMENT_8BITS
0.037 0.092 0.005 0.95 1.00 CANCEL
0.079 0.171 0.023 0.88 1.00 WINNER_CAP
0.128 0.255 0.052 0.83 2.50 LARGE_HEX
0.084 0.013 0.126 0.09 -1.00 USER_AGENT_APPLEMAIL
0.036 0.000 0.057 0.00 -4.00 Q_FOR_SELLER
0.023 0.061 0.001 0.99 1.00 CREDIT_BUREAU
0.028 0.072 0.002 0.97 1.00 SEX_FEST
0.023 0.060 0.001 0.99 2.47 UNIVERSITY_DIPLOMAS
0.021 0.056 0.000 1.00 2.00 RATWARE_EPAPER
0.022 0.059 0.001 0.99 1.00 FREE_LEADS
0.021 0.054 0.000 1.00 1.00 EXCUSE_18
0.268 0.468 0.147 0.76 0.10 MSGID_CHARS_WEIRD
0.020 0.052 0.000 1.00 2.00 RATWARE_SEEDNET
0.033 0.080 0.005 0.94 1.00 SUPPLIES_LIMITED
0.022 0.058 0.001 0.98 1.00 COUPON
0.068 0.146 0.021 0.87 0.80 DOMAIN_SUBJECT
0.156 0.292 0.074 0.80 1.00 NIGERIAN_TRANSACTION_1
0.026 0.065 0.003 0.96 1.00 SECRET_RECORD
0.018 0.047 0.000 1.00 1.00 X_MESSAGE_ID
0.068 0.144 0.022 0.87 1.00 WINNING_CAP
0.019 0.048 0.001 0.99 1.00 GET_STARTED_NOW
0.019 0.048 0.001 0.99 1.00 SERIOUS_CASH
0.041 0.094 0.009 0.91 1.00 SHOPPING_SPREE
0.046 0.102 0.011 0.90 1.00 GET_IT_NOW
0.024 0.060 0.003 0.96 1.00 LUXURY_CAR
0.255 0.434 0.146 0.75 2.48 DATE_IN_PAST_48_96
0.016 0.042 0.000 1.00 1.00 ONLINE_BIZ_OPS
0.015 0.040 0.000 1.00 1.00 MIME_BOUND_MAIL_BOUND
0.015 0.040 0.000 1.00 1.00 NO_AGE
0.015 0.039 0.000 1.00 3.66 MICRO_CAP_WARNING
0.027 0.064 0.004 0.94 1.00 HTML_WIN_BLUR
0.014 0.038 0.000 1.00 1.00 MIME_BOUND_MIME_BOUND
0.014 0.038 0.000 1.00 2.09 WWW_CLIK4YOU_COM
0.021 0.051 0.002 0.96 1.00 AMAZING_STUFF
0.033 0.076 0.007 0.91 0.69 NO_DISSAPOINTMENT
2.672 3.167 2.372 0.57 0.01 FORGED_RCVD_FOUND
0.042 0.092 0.011 0.89 0.10 MIME_SUSPECT_NAME
0.013 0.035 0.000 1.00 1.00 LESBIAN
0.013 0.035 0.000 1.00 2.00 RATWARE_POWERC
0.091 0.176 0.039 0.82 0.40 INCREASE_SOMETHING
0.022 0.000 0.035 0.00 -1.00 HOTMAIL_FOOTER3
0.026 0.062 0.005 0.93 1.00 ONE_HUNDRED_PC_GUAR
0.022 0.054 0.003 0.94 1.62 X_SMTPEXP_VERSION
0.026 0.061 0.005 0.93 2.62 NO_CATCH
0.013 0.034 0.000 1.00 1.41 S_1618
7.775 7.869 7.719 0.50 0.10 SPAM_PHRASE_03_05
0.050 0.105 0.017 0.86 1.00 MARKET_SOLUTION
0.016 0.040 0.001 0.97 1.00 MIME_BOUND_DIGITS_1
0.012 0.032 0.000 1.00 1.00 SPAM_PHRASES_040
0.015 0.038 0.001 0.97 3.05 INVESTOR_SPEC_SHEET
0.097 0.181 0.045 0.80 1.00 ONE_HUNDRED_PC_FREE
0.011 0.030 0.000 1.00 1.00 MIME_BOUND_SEP1
0.021 0.049 0.003 0.94 2.73 URGENT_BIZ
0.013 0.034 0.001 0.98 1.00 FREE_PREVIEW
0.015 0.037 0.001 0.97 1.97 SAFEGUARD_NOTICE
0.028 0.063 0.006 0.91 1.00 NIGERIAN_SCAM_10
0.011 0.029 0.000 1.00 1.00 PRIORITY_MAIL
0.016 0.040 0.002 0.96 1.00 CASH_BONUS
0.221 0.364 0.135 0.73 2.06 MSGID_CHARS_SPAM
0.604 0.373 0.744 0.33 -1.00 SUBJECT_FREQ
0.018 0.000 0.028 0.00 -5.00 PATCH_CONTEXT_DIFF
0.191 0.319 0.113 0.74 -2.50 FROM_AND_TO_SAME_2
0.018 0.044 0.003 0.94 1.00 PORN_13
3.266 2.477 3.742 0.40 -0.50 USER_AGENT_OUTLOOK
0.121 0.034 0.173 0.16 -1.00 HOTMAIL_FOOTER5
0.100 0.025 0.145 0.15 -1.00 MSN_FOOTER1
0.741 0.486 0.895 0.35 -0.10 SUBJECT_IS_NEWS
0.021 0.048 0.004 0.93 1.00 X_EM_REGISTRATION
0.010 0.027 0.000 1.00 3.00 SPAM_PHRASE_55_XX
0.015 0.037 0.002 0.96 2.90 MANY_FROMS
0.023 0.053 0.005 0.92 1.00 JOIN_MILLIONS
0.319 0.168 0.410 0.29 -1.00 ORDER_STATUS
0.019 0.045 0.003 0.93 1.00 NO_FORMS
0.019 0.045 0.003 0.93 1.00 BARELY_LEGAL
0.873 1.157 0.702 0.62 1.00 DISCLAIMER
0.135 0.052 0.186 0.22 -1.00 HOTMAIL_FOOTER1
0.040 0.085 0.014 0.86 1.00 INVALID_DATE_NO_TZ
0.016 0.000 0.026 0.00 -2.00 CRON_ENV
0.014 0.035 0.002 0.96 1.00 HTML_90_100
0.018 0.042 0.003 0.93 1.62 X_SMTPEXP_REGISTRATION
0.020 0.047 0.004 0.92 1.00 FREE_WEBSITE
0.332 0.497 0.232 0.68 1.00 US_DOLLARS_4
0.010 0.027 0.001 0.98 1.00 NIGERIAN_SCAM_9
0.020 0.046 0.004 0.92 1.00 NO_INVESTMENT
0.008 0.022 0.000 1.00 1.00 IN_ACCORDANCE_WITH_LAWS
0.014 0.000 0.022 0.00 1.00 SUBJECT_IS_IN_REVIEW
0.023 0.051 0.006 0.90 1.00 OFFSHORE_SCAM
0.008 0.022 0.000 1.00 2.00 RATWARE_DIFFOND
0.009 0.024 0.001 0.98 3.20 UNNEEDED_HTML_ENCODING
0.012 0.030 0.002 0.95 4.13 PURE_PROFIT
0.007 0.020 0.000 1.00 1.00 CHILD_SUPPORT
0.057 0.107 0.026 0.80 4.46 CHECK_OR_MONEY_ORDER
0.009 0.022 0.001 0.98 2.73 SERIOUS_ONLY
0.011 0.000 0.018 0.00 0.13 MAJORDOMO
0.008 0.022 0.001 0.98 1.68 EXCUSE_11
0.069 0.126 0.035 0.78 2.35 DEAR_EMAIL
0.007 0.017 0.000 1.00 1.00 MARKUP_SSPL
0.007 0.017 0.000 1.00 1.00 INTL_EXEC_GUILD
0.006 0.016 0.000 1.00 1.00 MIME_BOUND_HEX14
0.061 0.111 0.030 0.78 1.00 INVESTMENT
0.108 0.047 0.145 0.24 -2.00 FORGOTTEN_PASSWORD
0.006 0.015 0.000 1.00 2.26 CHARSET_FARAWAY_BODY
0.006 0.015 0.000 1.00 3.06 X_MAILER_GIBBERISH
0.060 0.107 0.031 0.78 1.00 REALLY_UNSAFE_JAVASCRIPT
0.010 0.023 0.002 0.94 1.00 NEW_CUSTOMER
0.005 0.014 0.000 1.00 1.00 NATURAL_VIAGRA
0.005 0.014 0.000 1.00 2.00 RATWARE_HSU
0.125 0.200 0.079 0.72 1.00 MARKETING
0.063 0.111 0.033 0.77 2.60 COMMUNIGATE
0.137 0.216 0.089 0.71 1.00 GIFT_CERTIFICATE
0.105 0.173 0.065 0.73 2.60 SEARCH_ENGINE_PROMO
0.008 0.000 0.013 0.00 -3.00 MSN_GROUPS
1.203 1.033 1.306 0.44 -1.00 FROM_NEWS_LIST
0.025 0.051 0.010 0.84 1.00 DATE_IN_FUTURE
0.005 0.012 0.000 1.00 3.21 SPAM_FORM_RETURN
0.005 0.012 0.000 1.00 2.00 RATWARE_CHARSET
0.010 0.023 0.002 0.92 0.55 TRACE_BY_SSN
0.077 0.129 0.046 0.74 3.01 NIGERIAN_SCAM_6
0.013 0.029 0.004 0.89 1.00 DRASTIC_REDUCED
0.031 0.009 0.044 0.16 -1.00 REG_THANKS
0.004 0.010 0.000 1.00 1.00 YOU_HAVE_BEEN_SELECTED
0.004 0.010 0.000 1.00 2.00 RATWARE_CARETOP
0.004 0.010 0.000 1.00 2.00 RATWARE_OPTIN
0.018 0.036 0.006 0.85 1.00 NO_REFUND
0.074 0.122 0.045 0.73 4.43 FORGED_JUNO_RCVD
0.004 0.010 0.000 1.00 1.00 NO_GIMMICK
0.047 0.082 0.026 0.76 1.00 X_EM_VER_PRESENT
0.679 0.799 0.606 0.57 0.40 SUBJ_MISSING
0.003 0.009 0.000 1.00 1.00 SPAM_FORM_ACTION
0.003 0.009 0.000 1.00 1.93 PORN_6
0.005 0.012 0.001 0.96 1.00 PHONE_CANCER
0.005 0.000 0.008 0.00 -4.00 MAILMAN_CONFIRM
0.035 0.062 0.018 0.77 1.00 PORN_12
0.005 0.011 0.001 0.96 1.00 MIME_BOUND_EQS_DASHES
0.003 0.008 0.000 1.00 2.00 RATWARE_EBIZ
0.003 0.008 0.000 1.00 1.00 WWW_AUTOREMOVE_COM
0.140 0.203 0.102 0.66 0.10 JAVASCRIPT_URI
0.574 0.670 0.515 0.57 1.00 NIGERIAN_SCAM_15
0.019 0.036 0.008 0.81 1.00 SEDUCTION
0.003 0.007 0.000 1.00 2.00 RATWARE_CBLAST
0.003 0.007 0.000 1.00 2.00 RATWARE_MATCHMAKER
0.003 0.007 0.000 1.00 1.00 PORN_GALLERIES
0.003 0.007 0.000 1.00 1.00 NO_MEDICAL
0.012 0.024 0.004 0.85 1.00 TO_UNSUB_REPLY
0.007 0.015 0.002 0.90 1.00 PORN_9
0.048 0.079 0.029 0.73 1.00 VERY_SUSP_CC_RECIPS
0.004 0.000 0.006 0.00 -10.00 EVITE
0.005 0.012 0.001 0.92 3.07 EXCUSE_2
0.002 0.006 0.000 1.00 1.00 BUY_JUDGEMENTS
0.002 0.006 0.000 1.00 2.00 RATWARE_LC_OUTLOOK
0.014 0.027 0.006 0.82 1.83 HTTP_NUMBER_WORD
0.004 0.009 0.001 0.94 3.60 EJACULATION
0.002 0.005 0.000 1.00 1.00 MORE_TRAFFIC
0.002 0.005 0.000 1.00 1.00 DISCONTINUE
0.017 0.005 0.024 0.18 -1.00 USER_AGENT_MACOE
0.009 0.019 0.004 0.84 1.00 LYING_EYES
0.034 0.055 0.020 0.73 3.09 DATE_WARNING
0.002 0.004 0.000 1.00 1.00 NIGERIAN_SCAM_3
0.002 0.004 0.000 1.00 1.00 PORN_7
0.002 0.004 0.000 1.00 1.00 REMOVE_ES_01
0.008 0.016 0.003 0.84 1.75 AUTO_EMAIL_REMOVAL
0.021 0.008 0.028 0.22 -1.00 HOTMAIL_FOOTER4
0.007 0.015 0.003 0.85 1.80 ONCE_IN_LIFETIME
0.006 0.013 0.002 0.86 1.00 CENTS_ON_DOLLAR
0.002 0.000 0.004 0.00 -1.00 USER_AGENT_TONLINE
0.041 0.064 0.028 0.70 1.00 MIME_BOUND_DIGITS_5
0.013 0.004 0.019 0.19 -5.00 LISTBUILDER
0.001 0.003 0.000 1.00 1.00 X_FIX_PRESENT
0.001 0.003 0.000 1.00 1.00 FREE_FLAG
0.001 0.003 0.000 1.00 1.00 SPAM_FORM_INPUT
0.001 0.003 0.000 1.00 2.66 POST_IN_RCVD
0.001 0.003 0.000 1.00 0.22 SPAM_FORM
0.001 0.003 0.000 1.00 1.00 OUTSTANDING_VALUE
0.001 0.003 0.000 1.00 3.50 CYBER_FIRE_POWER
0.001 0.003 0.000 1.00 1.00 FROM_TOPICA
0.001 0.003 0.000 1.00 1.00 INTERNET_TERROR_RANT
0.001 0.003 0.000 1.00 1.00 RATWARE_39
0.001 0.003 0.000 1.00 1.00 FREE_PRIORITY_MAIL
0.023 0.039 0.014 0.73 1.00 CANT_LIVE_WITHOUT
0.178 0.219 0.153 0.59 1.00 NIGERIAN_SCAM_16
0.003 0.006 0.001 0.92 1.00 NIGERIAN_SCAM_13
1.348 1.256 1.404 0.47 1.00 PLING
0.014 0.024 0.007 0.77 1.00 ALL_CAPS_SUBJECT
0.145 0.180 0.124 0.59 1.00 DIFFERENT_REPLY_TO
0.186 0.150 0.208 0.42 -1.00 ACCOUNT_CLICK
0.101 0.130 0.084 0.61 1.00 SLIGHTLY_UNSAFE_JAVASCRIPT
0.011 0.020 0.006 0.76 1.00 T_FREE_TICKETS
0.123 0.150 0.107 0.58 0.50 COMMENT
0.001 0.002 0.000 1.00 1.00 FROM_UGETMORE
0.001 0.002 0.000 1.00 1.00 EXCUSE_ES_01
0.001 0.002 0.000 1.00 1.00 NO_MIDDLEMAN
0.001 0.002 0.000 1.00 1.00 WWW_TRAFFICWOW_NET
0.001 0.002 0.000 1.00 2.00 RATWARE_UPROAR
29.457 16.905 37.045 0.31 -0.20 SPAM_PHRASE_00_01
8.700 3.513 11.835 0.23 -0.10 SPAM_PHRASE_01_02
6.512 3.642 8.247 0.31 -0.05 SPAM_PHRASE_02_03
0.000 0.000 0.000 0.00 -100.00 USER_IN_ALL_SPAM_TO
0.000 0.000 0.000 0.00 100.00 USER_IN_BLACKLIST
0.000 0.000 0.000 0.00 -20.00 USER_IN_MORE_SPAM_TO
0.000 0.000 0.000 0.00 -6.00 USER_IN_WHITELIST_TO
0.000 0.000 0.000 0.00 2.00 UNDESIRED_LANGUAGE_BODY
0.000 0.000 0.000 0.00 -10.00 NMS_CGI_NOT_BUGGY
0.001 0.003 0.000 1.00 -0.50 FROM_US_PHONE

--------END OF GOOD TESTS---------

these ones are to be dropped due to FPs:


0.008 0.014 0.004 0.77 1.00 FREE_HOSTING
0.035 0.048 0.027 0.64 1.56 DONT_DELETE
0.307 0.301 0.311 0.49 -1.00 SUBJECT_HAS_DATE
0.005 0.010 0.003 0.78 2.36 WWW_REMOVEYOU_COM
0.489 0.466 0.503 0.48 1.71 SPAM_REDIRECTOR
0.062 0.078 0.052 0.60 1.76 PLEASE_READ
0.045 0.059 0.037 0.61 1.00 SUSPICIOUS_CC_RECIPS
0.002 0.003 0.001 0.87 2.10 LONG_NUMERIC_HTTP_ADDR
0.005 0.009 0.003 0.77 1.00 INCREASE_TRAFFIC
0.060 0.073 0.052 0.58 0.57 CASINO
0.534 0.593 0.499 0.54 -1.00 FWD_MSG
0.111 0.099 0.118 0.46 -1.00 BALANCE_FOR_LONG_40K
0.038 0.048 0.031 0.61 1.00 BIG_BUCKS
0.000 0.001 0.000 1.00 1.00 GREEN_EXCUSE_2
0.000 0.001 0.000 1.00 1.00 GREEN_EXCUSE_1
0.000 0.001 0.000 1.00 1.00 CLICK_TO_REMOVE_3
0.000 0.001 0.000 1.00 1.00 PSYCHIC
0.000 0.001 0.000 1.00 4.20 YR_MEMBERSHIP_EXCH
0.000 0.001 0.000 1.00 1.00 MYCASINOBUILDER
0.000 0.001 0.000 1.00 1.00 INCREDIBLE
0.000 0.001 0.000 1.00 1.00 THIS_AINT_JUNK
0.002 0.004 0.001 0.81 1.00 MURKOWSKI_CRUFT
0.119 0.124 0.116 0.52 1.00 PORN_11
0.094 0.100 0.090 0.53 1.00 KNOWN_BAD_DIALUPS
0.048 0.056 0.044 0.56 1.00 MIME_BOUND_DIGITS_6
0.002 0.001 0.003 0.22 -1.00 TRACK_NUMBER
0.006 0.009 0.004 0.67 1.00 SIGNIFICANT
0.003 0.004 0.002 0.73 1.00 PORN_1
0.004 0.006 0.003 0.70 1.00 MONSTERHUT
0.067 0.067 0.066 0.50 1.00 FROM_NAME_EQ_FROM_ADDR
0.553 0.713 0.457 0.61 -1.00 BALANCE_FOR_LONG_20K
0.243 0.202 0.268 0.43 1.00 SUBJ_HAS_Q_MARK
0.315 0.253 0.352 0.42 1.00 PORN_3
0.264 0.217 0.292 0.43 1.00 PORN_10
0.004 0.006 0.003 0.66 2.00 RATWARE_YMR
0.018 0.021 0.017 0.55 1.00 RATWARE
0.002 0.003 0.002 0.69 1.00 X_ANTIABUSE
0.002 0.003 0.001 0.71 2.00 RATWARE_ANSMTP
0.002 0.003 0.001 0.71 1.00 CHANGE_TERMS
0.119 0.103 0.128 0.44 1.00 PORN_14
1.422 0.829 1.780 0.32 1.00 DOUBLE_CAPSWORD
0.023 0.022 0.023 0.49 1.00 FREE_TICKETS
1.297 1.989 0.878 0.69 -0.30 OUTLOOK_FW_MSG
0.002 0.003 0.002 0.62 1.00 SLASH_PRICE
0.015 0.015 0.015 0.50 5.00 FORGED_EBAY_RCVD
9.021 15.707 4.979 0.76 -1.00 USER_AGENT_OE
3.676 1.786 4.819 0.27 1.11 TO_MALFORMED
4.658 2.188 6.151 0.26 1.00 TO_BE_REMOVED_REPLY
0.215 0.136 0.262 0.34 1.00 T_FREE_WEBSITE
0.537 0.303 0.679 0.31 3.67 FROM_MALFORMED
0.806 0.429 1.034 0.29 -0.97 MIME_NULL_BLOCK
0.002 0.003 0.002 0.55 2.00 RATWARE_CSMTP
1.134 0.567 1.476 0.28 1.00 X_NOT_PRESENT
0.003 0.003 0.003 0.52 1.00 PRICES_WONT_LAST
0.275 0.160 0.345 0.32 1.00 HTML_COMMENT_UNIQUE_ID
0.140 0.203 0.102 0.67 -1.00 FAILURE_NOTICE_2
0.456 0.247 0.582 0.30 1.00 NIGERIAN_SCAM_12
0.149 0.221 0.105 0.68 -1.00 FAILURE_NOTICE_1
0.278 0.155 0.352 0.31 1.00 FROM_AND_TO_SAME
1.097 0.513 1.450 0.26 1.00 FROM_NAME_NO_SPACES
0.031 0.041 0.024 0.63 -1.00 T_NASD_FINANCIAL
0.041 0.057 0.031 0.65 -1.00 PRIVACY_STATEMENT
0.004 0.003 0.005 0.42 1.00 INCREASE_SALES
1.282 2.342 0.642 0.78 -0.10 USER_AGENT_AOL
1.741 0.606 2.427 0.20 0.48 TO_LOCALPART_EQ_REAL
0.001 0.001 0.001 0.45 1.00 REMOVE_ES_03
1.497 0.451 2.130 0.17 1.00 SUSPECT_LIST_HEADERS
0.034 0.015 0.046 0.24 -2.50 FROM_AND_TO_SAME_4
9.272 11.011 8.222 0.57 0.38 SUPERLONG_LINE
0.341 0.681 0.136 0.83 -1.00 MAILER_DAEMON
0.033 0.013 0.045 0.22 1.00 URI_IS_POUND
0.008 0.003 0.011 0.23 0.50 OUTLOOK_UNDISC_RECIPS
0.820 1.744 0.262 0.87 -1.00 USER_AGENT_THEBAT
1.061 0.134 1.621 0.08 1.00 SUBJ_ENDS_IN_Q_MARK
2.961 0.200 4.630 0.04 4.26 FROM_MISSING
0.046 0.002 0.073 0.02 1.00 SIGNATURE_DELIM

these ones are to be dropped due to lousy hit-rates:

0.008 0.003 0.011 0.18 1.00 REAL_THING
0.000 0.000 0.000 0.00 1.00 SELECTED
0.004 0.000 0.006 0.00 1.00 NIGERIAN_SCAM_11
0.000 0.000 0.001 0.00 1.00 EXCUSE_ES_02
0.000 0.000 0.000 0.00 1.00 WEALTH
0.000 0.000 0.000 0.00 1.00 EXCUSE_ES_03
0.000 0.000 0.000 0.00 1.00 POPLAUNCH
0.000 0.000 0.000 0.00 1.00 BREAKTHROUGH
0.000 0.000 0.000 0.00 1.00 RATWARE_38
0.000 0.000 0.000 0.00 1.00 RATWARE_37
0.000 0.000 0.000 0.00 1.00 RATWARE_36
0.000 0.000 0.000 0.00 1.00 VIGORA
0.000 0.000 0.000 0.00 1.00 RATWARE_35
0.000 0.000 0.000 0.00 1.00 IRS
0.000 0.000 0.000 0.00 1.00 RATWARE_34
0.000 0.000 0.000 0.00 1.00 CLICKSFORMONEY_NET
0.000 0.000 0.000 0.00 1.00 RATWARE_33
0.003 0.009 0.000 1.00 0.00 TO_INVESTORS
0.000 0.000 0.000 0.00 1.00 RATWARE_32
0.000 0.000 0.000 0.00 1.00 RATWARE_31
0.000 0.000 0.000 0.00 4.25 SHORT_RECEIVED_LINE
0.000 0.000 0.000 0.00 1.00 SPY_ON_FRIENDS
0.000 0.000 0.000 0.00 1.00 RATWARE_30
0.000 0.000 0.000 0.00 -1.00 MSN_FOOTER2
0.000 0.000 0.000 0.00 1.00 FILTERED_BY_WORLDREMOVE
0.000 0.000 0.000 0.00 1.00 NIGERIAN_SCAM
0.000 0.000 0.000 0.00 1.00 EU_200_32_CE
0.000 0.000 0.000 0.00 4.00 BUGGY_CGI_DE
0.000 0.000 0.000 0.00 1.00 BUGGY_CGI_PT_2
0.000 0.000 0.000 0.00 1.00 RATWARE_27
0.000 0.000 0.000 0.00 1.00 RATWARE_26
0.000 0.000 0.000 0.00 1.00 RATWARE_25
0.000 0.000 0.000 0.00 1.00 RATWARE_24
0.000 0.000 0.000 0.00 4.00 STAINLESS_STEEL
0.000 0.000 0.000 0.00 1.00 RATWARE_21
0.000 0.000 0.000 0.00 1.00 RATWARE_20
0.000 0.000 0.000 0.00 1.00 WWW_NETSITESFORFREE_NET
0.000 0.000 0.000 0.00 1.00 BACKED_BY
0.000 0.000 0.000 0.00 1.00 RATWARE_19
0.000 0.000 0.000 0.00 1.00 RATWARE_18
0.000 0.000 0.000 0.00 1.00 RATWARE_16
0.000 0.000 0.000 0.00 1.00 FREEWEBHOSTINGCENTRAL
0.000 0.000 0.000 0.00 1.00 RATWARE_15
0.000 0.000 0.000 0.00 1.00 RATWARE_14
0.000 0.000 0.000 0.00 1.00 PRINT_OUT_AND_FAX
0.000 0.000 0.000 0.00 1.00 RATWARE_13
0.000 0.000 0.000 0.00 1.00 RATWARE_12
0.000 0.000 0.000 0.00 1.00 BUGGY_CGI_ES
0.000 0.000 0.000 0.00 1.00 RATWARE_10
0.000 0.000 0.000 0.00 1.00 RATWARE_11
0.000 0.000 0.000 0.00 1.00 T_SUBJ_ISO885915
0.000 0.000 0.000 0.00 2.00 RATWARE_MAMA
0.000 0.000 0.000 0.00 4.00 BUGGY_CGI_DE_3
0.000 0.000 0.000 0.00 1.00 WWW_DIRECTFORCEMARKETING_COM
0.000 0.000 0.000 0.00 1.00 FREEWEBCO_NET_URL
0.001 0.000 0.001 0.00 1.80 EU_EMAIL_OPTOUT
0.000 0.000 0.000 0.00 2.67 CORRUPT_MSGID
0.000 0.000 0.000 0.00 4.00 BUGGY_CGI_PT
0.000 0.000 0.000 0.00 1.00 FREEMEGS_URL
0.029 0.078 0.000 1.00 -1.00 USER_AGENT_GNUS_XM
0.000 0.000 0.000 0.00 1.00 EXCUSE_8
0.000 0.000 0.000 0.00 1.00 RATWARE_40
0.000 0.000 0.000 0.00 1.00 ITS_EFFECTIVE
0.000 0.000 0.000 0.00 1.00 SHOES_GUY
0.000 0.000 0.000 0.00 1.00 REMOVE_ES_07
0.000 0.000 0.000 0.00 1.00 REMOVE_ES_06
0.000 0.000 0.000 0.00 1.00 WEB4PORNO_URL
0.000 0.000 0.000 0.00 1.00 RATWARE_00
0.000 0.000 0.000 0.00 1.00 YELLOWSUN
0.001 0.000 0.001 0.00 2.00 RATWARE_NETMAILER
0.000 0.000 0.000 0.00 1.00 RATWARE_01
0.000 0.000 0.000 0.00 1.00 BRAND_NEW_PAGER
0.000 0.000 0.000 0.00 3.46 FAKED_UNDISC_RECIPS_AT
0.002 0.005 0.000 1.00 -1.00 MAILBITS_EMAIL
0.000 0.000 0.000 0.00 1.00 REMOVE_ES_05
0.000 0.000 0.000 0.00 1.00 RATWARE_02
0.000 0.000 0.000 0.00 1.00 RATWARE_05
0.000 0.000 0.000 0.00 1.00 REMOVE_ES_04
0.000 0.000 0.000 0.00 1.00 NIGERIAN_SCAM_8
0.000 0.000 0.000 0.00 1.00 RATWARE_07
0.000 0.000 0.001 0.00 1.00 REMOVE_ES_02
0.000 0.000 0.000 0.00 1.00 E_WEBHOSTCENTRAL_URL
0.000 0.000 0.000 0.00 2.00 RATWARE_COGNI
0.000 0.000 0.000 0.00 1.00 NO_SELLING
0.000 0.000 0.000 0.00 2.00 RATWARE_HASH_1
0.000 0.000 0.000 0.00 1.00 RATWARE_45
0.000 0.000 0.000 0.00 -1.00 USER_AGENT_GNUS_UA
0.000 0.000 0.001 0.00 1.00 SUBSCRIBE_ES_01
0.000 0.000 0.000 0.00 1.00 RATWARE_43
0.000 0.000 0.000 0.00 1.00 RATWARE_42
0.000 0.000 0.000 0.00 1.00 RATWARE_41
0.000 0.000 0.000 0.00 1.00 CLICK_TO_REMOVE_MAILTO
0.001 0.000 0.001 0.00 1.00 BONUS_PAYMENT
0.000 0.000 0.000 0.00 1.00 ANOTHER_NET_AD
0.000 0.000 0.000 0.00 1.00 LIFE_INSURANCE
0.000 0.000 0.000 0.00 4.00 BUGGY_CGI_DE_2
0.000 0.000 0.000 0.00 1.00 T_FROM_ISO885915
0.000 0.000 0.000 0.00 1.00 NO_SPENDING
0.002 0.000 0.003 0.00 1.00 T_MONTH_TRIAL
0.000 0.000 0.001 0.00 1.00 T_FREE_HOSTING
0.004 0.000 0.006 0.00 1.00 T_MEMBER_2
0.000 0.000 0.001 0.00 1.00 T_SAVINGS
0.000 0.000 0.000 0.00 3.90 LASER_PRINTER
0.001 0.000 0.002 0.00 1.00 A_HREF_TO_REMOVE
0.000 0.000 0.001 0.00 1.00 T_FREE_INSTALL
0.001 0.000 0.002 0.00 1.00 T_SUBJ_FREE_CAP
0.001 0.000 0.002 0.00 1.00 PORN_8
0.018 0.000 0.028 0.00 1.00 T_TRADEMARK
0.042 0.000 0.067 0.00 1.00 UNIFIED_PATCH
0.004 0.000 0.006 0.00 1.00 T_USER_4U2
0.000 0.000 0.001 0.00 1.00 T_FREE_ACCESS
0.001 0.000 0.001 0.00 1.00 A_HREF_TO_UNSUB
0.008 0.000 0.013 0.00 1.00 T_DOMAIN_4U2
0.000 0.000 0.001 0.00 1.00 T_UNLIMITED

--j.
--
'Justin Mason' => { url => http://jmason.org/ , blog => http://taint.org/ }


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Malte S. Stretz
2002-08-20 14:04:21 UTC
Permalink
Post by Justin Mason
[...]
That leaves 708 tests, and drops about 200 tests.
comments? protests? ;)
Some... comments :)

First, it seems like somebody didn't run the tests with the frozen recent
ruleset; T_*_4U2 were moved (and renamed) from 70_cvs_rules_under_test.cf
some time before.

The compensate rule USER_AGENT should go or at least score very low; with
it, a spammer just has to include a header like
| User-Agent: KMail/1.4.3
and he'll get _two_ good points. Alternitively it might become a meta rule
but I don't know if it's worth it.
| meta USER_AGENT exists:User-Agent && ! USER_AGENT_KMAIL &&
| ! USER_AGENT_MOZILLA_UA && ...

The compensate NOSPAM_INC is very easy to fake and should score lower, maybe
0.25. This is true for all other easy-to-fake compensate rules.

I once went to improve MSG_ID_ADDED_BY_MTA_* (bug 552) but had a little
error in my code and then forgot about it. I'll have a look at this one for
the next release.

Another proof for mixed rulesets: GET_PAID vs. T_GET_PAID :)

WWW_CLIK4YOU_COM is caught by DOMAIN_4U2. I don't think it's worth an own
test with it's low ratio.

USER_AGENT_GNUS seems to be broken with its zero hits :) I'll have a look
into it...
Post by Justin Mason
[...]
Malte
--
-- Coding is art.
--
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Justin Mason
2002-08-20 14:23:22 UTC
Permalink
Post by Malte S. Stretz
First, it seems like somebody didn't run the tests with the frozen recent
ruleset; T_*_4U2 were moved (and renamed) from 70_cvs_rules_under_test.cf
some time before.
argh. Yes, I had to delete some mass-check outputs from some people using
VERY old test files, as well... must have missed one. :(
Post by Malte S. Stretz
The compensate rule USER_AGENT should go or at least score very low; with
it, a spammer just has to include a header like
BTW, the scores are irrelevant right now (unless the test is going to have
a non-GA'd score). The GA should reduce "easy" ones, and if it doesn't,
after the GA run is the time to comment. Don't pay any attention to them!
;)

Regarding broken tests (USER_AGENT_GNUS, some of the MIME_BOUNDS ones I
added et al). Just rehabilitate them *after* the release, ignore them for
now...

--j.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Malte S. Stretz
2002-08-20 15:16:35 UTC
Permalink
Post by Justin Mason
Post by Malte S. Stretz
First, it seems like somebody didn't run the tests with the frozen
recent ruleset; T_*_4U2 were moved (and renamed) from
70_cvs_rules_under_test.cf some time before.
argh. Yes, I had to delete some mass-check outputs from some people
using VERY old test files, as well... must have missed one. :(
Has this much impact on the outcome, what do you think? Maybe a script might
be usful which compares the logs to the current ruleset and barfs if it
found a rule there some which don't exist (anymore)...
Post by Justin Mason
Post by Malte S. Stretz
The compensate rule USER_AGENT should go or at least score very low;
with it, a spammer just has to include a header like
BTW, the scores are irrelevant right now (unless the test is going to
have a non-GA'd score). The GA should reduce "easy" ones, and if it
doesn't, after the GA run is the time to comment. Don't pay any
attention to them! ;)
I already thought so. This was just a comment so we won't forget it
afterwards.
Post by Justin Mason
Regarding broken tests (USER_AGENT_GNUS, some of the MIME_BOUNDS ones I
added et al). Just rehabilitate them *after* the release, ignore them
for now...
Of cause. I just announced :)

Malze
--
-- Coding is art.
--
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-20 18:14:50 UTC
Permalink
Post by Justin Mason
Post by Malte S. Stretz
The compensate rule USER_AGENT should go or at least score
very low; with
it, a spammer just has to include a header like
BTW, the scores are irrelevant right now (unless the test is
going to have
a non-GA'd score). The GA should reduce "easy" ones, and if it doesn't,
after the GA run is the time to comment. Don't pay any
attention to them!
;)
Remember that the GA is going to be considering combinatorial
uses of the rules, so rules which look dodgy on their own might
be gems for the GA -- perhaps something with a S/O ratio of .5
actually occurs often in combination with some other rule, and
in those situations, helps to distinguish spam vs nonspam.

C



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Harold Hallikainen
2002-08-21 02:57:23 UTC
Permalink
Haiku'da Been a Spam Filter

http://www.wired.com/news/technology/0,1282,54645,00.html



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Matthew Cline
2002-08-21 03:47:45 UTC
Permalink
Post by Harold Hallikainen
http://www.wired.com/news/technology/0,1282,54645,00.html
Summary: a company will offer short snippets of original, copyrighted and
trademarked text that can be inserted into email message headers, and email
filters can recognize this as a "not-spam" indicator. Any spammers who use
the text will be sued for copyright and trademark infringement.

The company's site is http://www.habeas.com/, and the copyrighted/trademarked
text is this haiku:

winter into spring,
brightly anticipated,
like Habeas SWE(TM)

The "Services" page lists "complimentary" filtering services as "SpamAssassin
and BrightMail". Heh, we get top billing.

The negative thing about their system is that it's "patent pending", and I
don't exactly like the idea of helping a business that uses software patents,
but if it helps to reduce false positives...
--
Give a man a match, and he'll be warm for a minute, but set him on
fire, and he'll be warm for the rest of his life.

ICQ: 132152059 | Advanced SPAM filtering software: http://spamassassin.org


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-21 06:39:13 UTC
Permalink
It's not really a software patent. As I understand it, they are
patenting the concept of putting a copyrighted string in an
RFC822 header in order to identify spam, and in order to create
a private right of action against spammers. It actually is a
really neat legal trick. In fact, patenting this is not
necessarily a bad thing anyway, since it really only makes sense
to have one such header to check for -- if there are hundreds,
it'll dillute the effectiveness of the method. I've been
talking with Dan Kohn about this concept for a while now (since
the time I was contemplating Deersoft), and I think it sounds
like quite a promising concept -- we'll see how well they do at
enforcing their copyright against chinese spammers and the like.

C
Post by Matthew Cline
The negative thing about their system is that it's "patent
pending", and I
don't exactly like the idea of helping a business that uses
software patents,
but if it helps to reduce false positives...
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Scott A Crosby
2002-08-21 12:16:02 UTC
Permalink
Post by Craig R.Hughes
It's not really a software patent. As I understand it, they are
patenting the concept of putting a copyrighted string in an RFC822
header in order to identify spam, and in order to create a private
right of action against spammers. It actually is a really neat legal
trick. In fact, patenting this is not necessarily a bad thing anyway,
since it really only makes sense to have one such header to check for
-- if there are hundreds, it'll dillute the effectiveness of the
method. I've been talking with Dan Kohn about this concept for a
while now (since the time I was contemplating Deersoft), and I think
it sounds like quite a promising concept -- we'll see how well they do
at enforcing their copyright against chinese spammers and the like.
About as well as the RIAA has been at preventing infringment. :)


And they have much better odds, IMHO. First, they can at least claim
signifigant monetary losses for infringement, and the size of the
works is much larger.

There is also prior art.. Look at my email header, for example. Note
the 'Organization: Rice University'? If I wasn't here, I'm pretty sure
that they'd klomp down on me for using it. Now, thats trademark law,
not patent law.

Also, wasn't there a few cases a few years ago about, for example,
copyrighting URLs, and using that to prevent deep-linking? (How'd
those turn out anyways?) If the only way to send email to you is to
mechanically put a 'FooHeader: Bar' in it, if doing that is copyright
infringement, then why isn't deep-linking also copyright infringement?

Hell, if that works, here's a better idea:

I'll copyright/trademark my email address, then sue the spammers
for improperly using it. (After all, they can't send me email
without using my email address, right?)

In essence, the scheme sounds like putting a legal (rather than
cryptographic) certificate in email, then klomping down on the those
who forge the certificate.

But if you're doing this, why not just do a cryptographic
cert. Unforgable[1], and not requiring after-the-fact legal
enforcement. Users can just trash all email not with a PGP sig?
(and/or not on their web-of-trust.)

I'm not saying that certs are a bad idea.. But I don't want to have
yet more time and money spent on lawyers, so how about cryptographic
certs?

Scott

[1] I'm going by the informal definition.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Theo Van Dinter
2002-08-21 14:25:37 UTC
Permalink
Post by Scott A Crosby
those turn out anyways?) If the only way to send email to you is to
mechanically put a 'FooHeader: Bar' in it, if doing that is copyright
infringement, then why isn't deep-linking also copyright infringement?
Adding "FooHeader: Bar" requires you to actually have a certain bit
of text (Bar). Deep-linking is following an index to information.
You don't have anything except a pointer to where something (information
in this case) is located.

If URLs are copywrite-able, then my street address and name are
copyright-able. Afterall, they're all pointers (which are intangible)
to something. Can I start suing people for sending me mail because they
didn't get my permission to use my address? What about telemarketers who
ask for me by name? "Sorry, 'Theo Van Dinter' is copyrighted material,
it'll cost you $25,000 per call to use that name!" It's ridiculous.
--
Randomly Generated Tagline:
"A good rice cooker will have a hinged top and pink floral patterns on
it, btw." - Eric Lakin


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Scott A Crosby
2002-08-21 21:42:49 UTC
Permalink
Post by Theo Van Dinter
Post by Scott A Crosby
those turn out anyways?) If the only way to send email to you is to
mechanically put a 'FooHeader: Bar' in it, if doing that is copyright
infringement, then why isn't deep-linking also copyright infringement?
Adding "FooHeader: Bar" requires you to actually have a certain bit
of text (Bar). Deep-linking is following an index to information.
[1]If I remember right, you can't copyright something if there's only ONE
way to express it. (I think this was a case with someone who made
nintendo gameboy games. The gameboy checks the first hundred bytes of
a cartridge ROM to make sure it equals a particular snippet of code
(the bit that displays 'nintendo'). Code that they claimed the
copyright for.. The idea was that anyone who didn't pay them,
couldn't, by copyright law, use that code, and because of the check,
would have their cartridges not work.

[1]In that case, using that code was rule NOT infringing. Copyright
applies to expression, something where there exists but one way of
doing it. In the above, I assume that the only way to get email to be
received is to put 'Bar' in it.. By that precedent, 'Bar' is
uncopyrightable.[1]

[1]IE, the haiku may be copyrightable in a book, because its his unique
expression. However, as a mechanically added thing to email (because,
say, without it, ANY email is ignored), duplication of it would not be
ruled infringement.
Post by Theo Van Dinter
You don't have anything except a pointer to where something (information
in this case) is located.
Scott



[1] I can't find a good reference for that case, but I *believe* that
this his how it was decided. I may be in error.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-22 17:29:06 UTC
Permalink
It's a bit more subtle than that though, since email without the
header is *not* ignored. It's just not considered less-spammy.
Certainly there is a functional aspect to the Haiku used in this
way, but if you read the actual Haiku, you could certainly argue
that it is artistically hobbled if it's placed in the wrong
location, and that placing the Haiku in the wrong artistic
environment without license would constitute an unauthorized
derivative work.

C
Post by Scott A Crosby
[1]IE, the haiku may be copyrightable in a book, because its his unique
expression. However, as a mechanically added thing to email (because,
say, without it, ANY email is ignored), duplication of it would not be
ruled infringement.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-22 16:26:47 UTC
Permalink
Trouble is, you don't necessarily know ahead of time who's
wanting to send you stuff. I don't have your PGP public key on
my keychain. You can do all the signing you want, it's not
going to help. However, if you stick a Habeas header in your
mail, I can (hopefully) be reasonably sure that you're not
trying to spam me. It is going to be all about enforcement
though; we'll have to see how they do on enforcement.

C
Post by Scott A Crosby
In essence, the scheme sounds like putting a legal (rather than
cryptographic) certificate in email, then klomping down on the those
who forge the certificate.
But if you're doing this, why not just do a cryptographic
cert. Unforgable[1], and not requiring after-the-fact legal
enforcement. Users can just trash all email not with a PGP sig?
(and/or not on their web-of-trust.)
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Tony L. Svanstrom
2002-08-22 17:13:10 UTC
Permalink
Trouble is, you don't necessarily know ahead of time who's wanting to send
you stuff. I don't have your PGP public key on my keychain. You can do all
the signing you want, it's not going to help. However, if you stick a Habeas
header in your mail, I can (hopefully) be reasonably sure that you're not
trying to spam me. It is going to be all about enforcement though; we'll
have to see how they do on enforcement.
Signing messages to someone ('s public key) isn't impossible, using some
creative scripting you could even do it with OpenPGP-compatible software...


/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-22 18:22:43 UTC
Permalink
So do you know my public key? Does the guy who wants to buy
10,000 licenses of SpamAssassin Pro? Do I really want to lose
email from either of you?

C
Post by Tony L. Svanstrom
Post by Craig R.Hughes
Trouble is, you don't necessarily know ahead of time who's
wanting to send
you stuff. I don't have your PGP public key on my keychain.
You can do all
the signing you want, it's not going to help. However, if you stick a Habeas
header in your mail, I can (hopefully) be reasonably sure that you're not
trying to spam me. It is going to be all about enforcement
though; we'll
have to see how they do on enforcement.
Signing messages to someone ('s public key) isn't impossible,
using some
creative scripting you could even do it with OpenPGP-compatible software...
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Sidney Markowitz
2002-08-22 19:54:41 UTC
Permalink
Post by Craig R.Hughes
So do you know my public key? Does the guy who wants to buy
10,000 licenses of SpamAssassin Pro? Do I really want to lose
email from either of you?
I think this discussion has gotten a little off track. The idea is to
provide something that acts like a better whitelist than anything that can
easily be forged.

Let's look at the alternatives:

1. Simple From: header whitelist: Easy to test for, but requires a whitelist
entry inserted for every expected sender, can easily be forged

2. Use the From: header with some Received line rules: not generally
applicable, can only be set up by receiver on a case by case basis

3. Habeas style copyright and trademarked phrase: Easy for sender to license
and use. Easy to test for, but can easily be forged by anyone who doesn't
feel threatened by the legal repercussions

4. DNSBL: Only certifies an ip address, not who is using it. As it is
currently used, DNSBL allows you to look up if some IP address has been
blacklisted by someone. What I haven't seen is a service that provides a DNS
based whitelist. That might be a useful service -- A legitimate mailing list
service provider registers the static ip addresses of their private mail
servers with the DNSWL, perhaps for a fee. The DNSWL lists the ip address,
revoking the listing if there are verified complaints of spam being sent
through that ip address. Filters like SpamAssassin can give negative points
to an ip address being listed on the DNSWL. How to detect forged Received
lines is left as an exercise for the implementor.

5. Digital signature certified by a whitelisting service: This is the one
that is the cryptographic version of Habeas. The bulk mailer who wants their
newsletters whitelisted registers with the service and gets a signed
certificate. They use the certificate to sign all of their newletters, with
the signature in a header. That means that their mail sending software has
to be able to do that. SpamAssassin recognizes the header, looks up the
public key at the whitelist service's site, verifies that it has not been
revoked, verifies the signature as matching the mail assuming that nothing
in transit has munged the contents too much to mess up the hash, and gives
it some whitelist negative points. Yeah, right. Well, maybe it could work as
an unforgeable whitelist header that is not seen too often but is used by
some large commercial non-spamming bulk mailers.

6. Geeks look up your public key and send encrypted mail to you. That works
until spammers start using PGP encryption :-)

There are tradeoffs between having to know people before you whitelist them
vs using somebody else's certification that a sender deserves to be
whitelisted; also between how easy it is to forge what is being tested
versus how easy it is to send vs how easy it is to verify. Habeas provides
the third party certification of people you don't already know, makes it
easy to send, makes it easy to verify, and makes forgery as easy or as
difficult as the legal consequences make it for any particular spammer.

Personally I think that the DNSWL idea is the best. It is easy to set up a
mail server on its own static ip address, and for smaller operations it
would be easy to contract with a mailing service provider who would be
responsible for policing their customers in order to keep their ip addresses
certified.

-- sidney




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Sidney Markowitz
2002-08-22 19:54:28 UTC
Permalink
Post by Craig R.Hughes
So do you know my public key? Does the guy who wants to buy
10,000 licenses of SpamAssassin Pro? Do I really want to lose
email from either of you?
I think this discussion has gotten a little off track. The idea is to
provide something that acts like a better whitelist than anything that can
easily be forged.

Let's look at the alternatives:

1. Simple From: header whitelist: Easy to test for, but requires a whitelist
entry inserted for every expected sender, can easily be forged

2. Use the From: header with some Received line rules: not generally
applicable, can only be set up by receiver on a case by case basis

3. Habeas style copyright and trademarked phrase: Easy for sender to license
and use. Easy to test for, but can easily be forged by anyone who doesn't
feel threatened by the legal repercussions

4. DNSBL: Only certifies an ip address, not who is using it. As it is
currently used, DNSBL allows you to look up if some IP address has been
blacklisted by someone. What I haven't seen is a service that provides a DNS
based whitelist. That might be a useful service -- A legitimate mailing list
service provider registers the static ip addresses of their private mail
servers with the DNSWL, perhaps for a fee. The DNSWL lists the ip address,
revoking the listing if there are verified complaints of spam being sent
through that ip address. Filters like SpamAssassin can give negative points
to an ip address being listed on the DNSWL. How to detect forged Received
lines is left as an exercise for the implementor.

5. Digital signature certified by a whitelisting service: This is the one
that is the cryptographic version of Habeas. The bulk mailer who wants their
newsletters whitelisted registers with the service and gets a signed
certificate. They use the certificate to sign all of their newletters, with
the signature in a header. That means that their mail sending software has
to be able to do that. SpamAssassin recognizes the header, looks up the
public key at the whitelist service's site, verifies that it has not been
revoked, verifies the signature as matching the mail assuming that nothing
in transit has munged the contents too much to mess up the hash, and gives
it some whitelist negative points. Yeah, right. Well, maybe it could work as
an unforgeable whitelist header that is not seen too often but is used by
some large commercial non-spamming bulk mailers.

6. Geeks look up your public key and send encrypted mail to you. That works
until spammers start using PGP encryption :-)

There are tradeoffs between having to know people before you whitelist them
vs using somebody else's certification that a sender deserves to be
whitelisted; also between how easy it is to forge what is being tested
versus how easy it is to send vs how easy it is to verify. Habeas provides
the third party certification of people you don't already know, makes it
easy to send, makes it easy to verify, and makes forgery as easy or as
difficult as the legal consequences make it for any particular spammer.

Personally I think that the DNSWL idea is the best. It is easy to set up a
mail server on its own static ip address, and for smaller operations it
would be easy to contract with a mailing service provider who would be
responsible for policing their customers in order to keep their ip addresses
certified.

-- sidney




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Bart Schaefer
2002-08-26 02:58:31 UTC
Permalink
Post by Sidney Markowitz
1. Simple From: header whitelist: Easy to test for, but requires a whitelist
entry inserted for every expected sender, can easily be forged
2. Use the From: header with some Received line rules: not generally
applicable, can only be set up by receiver on a case by case basis
The new SA AWL stuff that scores emailaddress + IPaddress pairs is sort of
a combination of these two, which might make it possible to distinguish
most forgeries. What's missing is a way to verify the routing implied by
the Received: headers, to be sure none of those were falsified.
Post by Sidney Markowitz
4. DNSBL: Only certifies an ip address, not who is using it. As it is
currently used, DNSBL allows you to look up if some IP address has been
blacklisted by someone. What I haven't seen is a service that provides a
DNS based whitelist.
There's a practical reason for that: Any DNS list (white or black) works
only for a limited number of IPs; the set of unlisted IPs is much larger
than the set in the DNS list. If you have to make a binary decision to
accept or reject, you'll be wrong less often if you reject the blacklist
and accept everything else, rather than accept the whitelist and reject
everything else.

A whitelist is only helpful when (a) you only want mail from a limited
number of known sources, or (b) you can use a secondary system like SA to
decide what to do with the vast unlisted masses. Most MTAs still make
only the binary decision, because the secondary computation is expensive.

With SA's cooperation, though, it might be worth a try. Even better if
one could get commercial anti-spam outfits to agree to factor it in.
Post by Sidney Markowitz
5. Digital signature certified by a whitelisting service: This is the
one that is the cryptographic version of Habeas. The bulk mailer who
wants their newsletters whitelisted registers with the service and gets
a signed certificate. They use the certificate to sign all of their
newletters, with the signature in a header. [...] assuming that nothing
in transit has munged the contents too much to mess up the hash, and
gives it some whitelist negative points. Yeah, right.
There are a large number of reasons why it doesn't work to put a signature
computed over the body into the headers, and why it also doesn't work to
sign both the headers and the body without adding an extra encapsulation.
Look through the S/MIME drafts and mailing list archives if you need gory
details: http://www.ietf.org/html.charters/smime-charter.html

In fact, I'm curious to find out just how well Habeas's mark manages to
make its way through various mail relays and gateways and reflectors.
Post by Sidney Markowitz
6. Geeks look up your public key and send encrypted mail to you. That
works until spammers start using PGP encryption :-)
This doesn't need encryption, just signatures. The sender doesn't need
the recipient's key, either; but the recipient has to be able to get the
sender's key, and to believe that the putative sender and the creator of
the key are really the same entity.



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-26 15:33:19 UTC
Permalink
There is a whitelist RBL now! Ironport's Bonded Sender is
basically a whitelist RBL where you post a bond to get on the
list, and then lose the bond if you end up spamming from that IP
address (or something like that). http://www.bondedsender.org/

C
Post by Bart Schaefer
Post by Sidney Markowitz
4. DNSBL: Only certifies an ip address, not who is using it. As it is
currently used, DNSBL allows you to look up if some IP address has been
blacklisted by someone. What I haven't seen is a service that
provides a
DNS based whitelist.
There's a practical reason for that: Any DNS list (white or
black) works
only for a limited number of IPs; the set of unlisted IPs is
much larger
than the set in the DNS list. If you have to make a binary decision to
accept or reject, you'll be wrong less often if you reject the
blacklist
and accept everything else, rather than accept the whitelist and reject
everything else.
A whitelist is only helpful when (a) you only want mail from a limited
number of known sources, or (b) you can use a secondary system
like SA to
decide what to do with the vast unlisted masses. Most MTAs still make
only the binary decision, because the secondary computation is
expensive.
With SA's cooperation, though, it might be worth a try. Even better if
one could get commercial anti-spam outfits to agree to factor it in.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Tony L. Svanstrom
2002-08-22 20:22:05 UTC
Permalink
Post by Tony L. Svanstrom
Signing messages to someone ('s public key) isn't impossible, using some
creative scripting you could even do it with OpenPGP-compatible software...
So do you know my public key? Does the guy who wants to buy 10,000 licenses
of SpamAssassin Pro? Do I really want to lose email from either of you?
View it as a backdoor not easily exploited by your average spammer, like Adams
HashCash[*]; something you can use to filter some e-mails to the top of your
"what to read first"-list.


/Tony
[*] <URL: http://www.cypherspace.org/hashcash/ >
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'





-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Tony L. Svanstrom
2002-08-21 12:21:10 UTC
Permalink
Post by Matthew Cline
Post by Harold Hallikainen
http://www.wired.com/news/technology/0,1282,54645,00.html
Summary: a company will offer short snippets of original, copyrighted and
trademarked text that can be inserted into email message headers, and email
filters can recognize this as a "not-spam" indicator. Any spammers who use
the text will be sued for copyright and trademark infringement.
The negative thing about their system is that it's "patent pending", and I
don't exactly like the idea of helping a business that uses software patents,
but if it helps to reduce false positives...
Cute idea, but this will both open up a hole that's easily found in SA and it
will support a company that is patenting the idea of including a unique "this
is not spam"-message in e-mails...
What'll happen when they start getting low on money, they start hunting people
down for using passwords in subject-lines to whitelist people, or maybe Spam-
Assassin and/or Deersoft must start paying a fee for recognizing their lil
haiku.

Yes, saying that they'll sue anyone using it is easy, but the sad fact is that
even if they're Bill Gates they can't reach a lot of the spammers out there;
not to mention that it isn't even sure thing that their text is/could be
copyrighted in every part of the world.


/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Matthew Cline
2002-08-21 21:02:30 UTC
Permalink
Post by Tony L. Svanstrom
What'll happen when they start getting low on money, they start hunting
people down for using passwords in subject-lines to whitelist people, or
maybe Spam- Assassin and/or Deersoft must start paying a fee for
recognizing their lil haiku.
The nature of our patent is about putting a warrant mark in RFC 2822
X-headers (or in the body)
So no existing authentication method is covered by the patent.
--
Give a man a match, and he'll be warm for a minute, but set him on
fire, and he'll be warm for the rest of his life.

ICQ: 132152059 | Advanced SPAM filtering software: http://spamassassin.org


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Tony L. Svanstrom
2002-08-21 21:38:45 UTC
Permalink
Post by Matthew Cline
Post by Tony L. Svanstrom
What'll happen when they start getting low on money, they start hunting
people down for using passwords in subject-lines to whitelist people, or
maybe Spam- Assassin and/or Deersoft must start paying a fee for
recognizing their lil haiku.
The nature of our patent is about putting a warrant mark in RFC 2822
X-headers (or in the body)
So no existing authentication method is covered by the patent.
Well, without reading it you don't know for sure, I mean, it isn't a two-line
dokument...

Anyways, I find the whole thing very interesting, but I think that it will
give some not-very-nice-people ideas and that something like this could go very
wrong when/if popular enough and the company behind it runs out of money.


/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-22 16:29:00 UTC
Permalink
Good luck! They need us a whole heck of a lot more than we need them.

C
Post by Tony L. Svanstrom
maybe Spam-
Assassin and/or Deersoft must start paying a fee for
recognizing their lil
haiku.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-22 16:35:33 UTC
Permalink
For me the question of enforcement comes down to who the
"average" spammer is. I personally think a huge volume of spam
comes from some mom&pop operation where Mom bought into a
get-rich-quick scheme "Make money from your living room" type
thingie, hooked up a DSL line out to the farmhouse in rural
Iowa, and started sending out medium volumes of spam for a
living, not knowing how much it pisses people off. In her mind,
Mom is doing nothing particularly wrong. Heck her list provider
assured her it was a double-confirmed-super-opt-in list. All
those people want to get the spam. And the guy who sold her the
spamming software for $59.95 told it it was fully legit, and
would get through all those dumb spam filters out there which
weren't smart enough to realize she isn't actually a spammer,
since it's a double-super-opt-in list (not). So the software
automatically inserts Habeas SWE headers. Now Habeas gets wind
that Mom is spamming, and using their headers in an unlicensed
way. Habeas sues mom? Wins? Gets a lien on the farm? Evicts
mom and pop?

If a large enough number of spammers are of this type (and I
think they probably are), then Habeas could have problems. And
that's not even considering the Korean/Chinese spammers who are
probably not even reachable by Habeas' enforcement agents. Now
to deal with both of these more cheaply than through legal
means, Habeas does have the HIL RBL system... It could be
they'll do OK.

C
Post by Tony L. Svanstrom
Yes, saying that they'll sue anyone using it is easy, but the
sad fact is that
even if they're Bill Gates they can't reach a lot of the
spammers out there;
not to mention that it isn't even sure thing that their text
is/could be
copyrighted in every part of the world.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Matthew Cline
2002-08-23 00:22:29 UTC
Permalink
Post by Craig R.Hughes
Mom is doing nothing particularly wrong. Heck her list provider
assured her it was a double-confirmed-super-opt-in list. All
those people want to get the spam. And the guy who sold her the
spamming software for $59.95 told it it was fully legit, and
would get through all those dumb spam filters out there which
weren't smart enough to realize she isn't actually a spammer,
since it's a double-super-opt-in list (not). So the software
automatically inserts Habeas SWE headers. Now Habeas gets wind
that Mom is spamming, and using their headers in an unlicensed
way. Habeas sues mom? Wins? Gets a lien on the farm? Evicts
mom and pop?
All they'd need to due would be to *threaten* to sue the Mom&Pop operation,
and they'd stop (or switch to other spam software). Also, they could
probably sue the SpamWare makers for including the SWE headers.
Post by Craig R.Hughes
If a large enough number of spammers are of this type (and I
think they probably are), then Habeas could have problems. And
that's not even considering the Korean/Chinese spammers who are
probably not even reachable by Habeas' enforcement agents.
You could ignore the SWE headers for mail that's Korean or Chinese (although
that kinda defeats the purpose of the Habeas...)
--
Give a man a match, and he'll be warm for a minute, but set him on
fire, and he'll be warm for the rest of his life.

ICQ: 132152059 | Advanced SPAM filtering software: http://spamassassin.org


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-23 00:52:03 UTC
Permalink
Post by Matthew Cline
Post by Craig R.Hughes
Mom is doing nothing particularly wrong. Heck her list provider
assured her it was a double-confirmed-super-opt-in list. All
those people want to get the spam. And the guy who sold her the
spamming software for $59.95 told it it was fully legit, and
would get through all those dumb spam filters out there which
weren't smart enough to realize she isn't actually a spammer,
since it's a double-super-opt-in list (not). So the software
automatically inserts Habeas SWE headers. Now Habeas gets wind
that Mom is spamming, and using their headers in an unlicensed
way. Habeas sues mom? Wins? Gets a lien on the farm? Evicts
mom and pop?
All they'd need to due would be to *threaten* to sue the
Mom&Pop operation,
and they'd stop (or switch to other spam software). Also, they could
probably sue the SpamWare makers for including the SWE headers.
They might or they might not. Sometime people react
aggressively to receiving cease-and-desist letters, particularly
when they don't think they're doing anything wrong. The
SpamWare maker might have evaporated by the time Habeas tries to
find them. Mom&Pop could easily be the only target. And
Mom&Pop might not be evil people -- the analogy I used with Dan
was that of Javert who spends his entire life pursuing Valjean,
only to find that Valjean is in fact a good man, ultimately
leading to his suicide.
Post by Matthew Cline
Post by Craig R.Hughes
If a large enough number of spammers are of this type (and I
think they probably are), then Habeas could have problems. And
that's not even considering the Korean/Chinese spammers who are
probably not even reachable by Habeas' enforcement agents.
You could ignore the SWE headers for mail that's Korean or
Chinese (although
that kinda defeats the purpose of the Habeas...)
The spam could be in english, just originating from somewhere
people could give a shit about Habeas' copyright.

C



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Michael Moncur
2002-08-23 05:21:07 UTC
Permalink
Post by Matthew Cline
All they'd need to due would be to *threaten* to sue the Mom&Pop operation,
and they'd stop (or switch to other spam software). Also, they could
probably sue the SpamWare makers for including the SWE headers.
My idea about this is that there's not much point in stopping any one
Mom&Pop spammer. It seems like most of them only spam ONCE and then either
get a clue or get shut off by their ISP... but there's an endless supply of
people to take their place.

But if it could put pressure on the spamware makers, it might be worth it.

--
Michael Moncur mgm at starlingtech.com http://www.starlingtech.com/
"The cure for writer's cramp is writer's block." --Inigo DeLeon



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Bart Schaefer
2002-08-22 15:12:50 UTC
Permalink
Post by Matthew Cline
Post by Harold Hallikainen
http://www.wired.com/news/technology/0,1282,54645,00.html
Summary: a company will offer short snippets of original, copyrighted
and trademarked text that can be inserted into email message headers,
and email filters can recognize this as a "not-spam" indicator. Any
spammers who use the text will be sued for copyright and trademark
infringement.
They may be in for a patent fight before any of this goes forward:

http://www.eweek.com/article2/0,3959,476558,00.asp

"Banking on the fact that few enterprises sending commercial mail want to
be associated with spam, IronPort Systems Inc., in San Bruno, Calif., has
developed the Bonded Sender program in an effort to give legitimate bulk
e-mailers some credibility."



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Scott A Crosby
2002-08-21 11:57:35 UTC
Permalink
`> Remember that the GA is going to be considering combinatorial uses of
the rules, so rules which look dodgy on their own might be gems for
the GA -- perhaps something with a S/O ratio of .5 actually occurs
often in combination with some other rule, and in those situations,
helps to distinguish spam vs nonspam.
We're currently using a perceptron classifier. It *can't* learn
combinations of rules.[1]

I gave an example, assume 4 rules:
SPAM = (A or B) and (C or D)

It cannot learn that function.

A decision tree classifier *can* learn that example, and the function
above, where a .5 S/O rule is only important in certain
circumstances. (Then again, it may be smartest to hardcode a meta-rule
for that case, rather than trust to a naive DT learner.)

Scott


[1] To be fair, nor can a Bayes classifier.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-22 16:20:05 UTC
Permalink
It *can* learn combinatorial stuff in a more subtle way.
Imagine rules A, B and C, and S1, M2 and M3 as messages (S means
spam, M is mail):

S1: A + B
M2: A
M3: B

It could learn that A or B is not spam, but A+B is spam.

Similarly:

M1: A + B
M2: A
S3: B

It can learn that.

I'm still not convinced on the decision tree theory -- it could
well work out fine, but I suspect you're likely to end up with
either a very poorly-resolving tree, or massive overfitting.
I'll wait to see some practical results though...

C
On Tue, 20 Aug 2002 11:14:50 -0700, Craig R.Hughes
`> Remember that the GA is going to be considering
combinatorial uses of
the rules, so rules which look dodgy on their own might be gems for
the GA -- perhaps something with a S/O ratio of .5 actually occurs
often in combination with some other rule, and in those situations,
helps to distinguish spam vs nonspam.
We're currently using a perceptron classifier. It *can't* learn
combinations of rules.[1]
SPAM = (A or B) and (C or D)
It cannot learn that function.
A decision tree classifier *can* learn that example, and the function
above, where a .5 S/O rule is only important in certain
circumstances. (Then again, it may be smartest to hardcode a meta-rule
for that case, rather than trust to a naive DT learner.)
Scott
[1] To be fair, nor can a Bayes classifier.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Scott A Crosby
2002-08-23 02:21:01 UTC
Permalink
It *can* learn combinatorial stuff in a more subtle way. Imagine
No it can't..

It can learn a few examples that happen to be linearily seperable,
like those you gave. It cannot learn the example I gave below, which
is not linearily seperable.
I'm still not convinced on the decision tree theory -- it could well
work out fine, but I suspect you're likely to end up with either a
very poorly-resolving tree, or massive overfitting. I'll wait to see
some practical results though...
Now this I'll agree with. :)

Scott
Post by Scott A Crosby
We're currently using a perceptron classifier. It *can't* learn
combinations of rules.[1]
SPAM = (A or B) and (C or D)
It cannot learn that function.
A decision tree classifier *can* learn that example, and the function
above, where a .5 S/O rule is only important in certain
circumstances. (Then again, it may be smartest to hardcode a meta-rule
for that case, rather than trust to a naive DT learner.)
Scott
[1] To be fair, nor can a Bayes classifier.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Bob Proulx
2002-08-23 04:31:45 UTC
Permalink
Post by Scott A Crosby
Post by Scott A Crosby
We're currently using a perceptron classifier. It *can't* learn
combinations of rules.[1]
SPAM = (A or B) and (C or D)
It cannot learn that function.
It *can* learn combinatorial stuff in a more subtle way. Imagine
[...]
No it can't..
It can learn a few examples that happen to be linearily seperable,
like those you gave. It cannot learn the example I gave below, which
is not linearily seperable.
Scott is absolutely correct. In fact I am finding this thread very
humorous since it parallels the historical research into neural
networks. In the 1950's there was a huge surge of interest in neural
networks. But they were just not producing that people wanted. In
the early 1970's sometime Minsky and Papart published "Perceptrons"
with a now famous picture on the cover which showed two drawings which
were particularly difficult for a NN to tell apart. Are the lines
connected or not? And that is also something that humans find
difficult too.

In Perceptrons they mathmatically proved many of the limitations of
NNs. They proved that a single level NN cannot differentiate between
sets which are not linearly separable. That one book became a
cornerstone of NN research. And because of the proof of limitations
of single layer NN it virtually killed all NN research for 20 years.
A long time of virtually zero activity in the area ensued because
everyone was thinking only of the limitations which were proved.

Then in the late 1990's NN interest picked upgain. It was realized
that just because a single layer NN has limitations that other
architectures of NN also must have those same limitations. And they
don't. Multilayer neural networks are now seen as a way to advance NN
capabilities. Multilayer neural networks can overcome many of the
problems and limitations of single layer neural networks.

SA is basically a single layer classification engine. Probably adding
hidden layers to the engine will solve the same similar problems. It
will be fun to watch this solution getting discovered for the first
time, all over again.

Bob
Craig R.Hughes
2002-08-23 05:58:26 UTC
Permalink
It's not so much that we're discovering it all over again, it's
much more that:

1) current system works pretty good
2) too lazy to change it to make it marginally better

C
Post by Bob Proulx
It
will be fun to watch this solution getting discovered for the first
time, all over again.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-23 05:53:44 UTC
Permalink
I never claimed it could learn *all* combinatorial
possibilities, but it certainly can learn some.

C
On Thu, 22 Aug 2002 09:20:05 -0700, Craig R.Hughes
It *can* learn combinatorial stuff in a more subtle way. Imagine
No it can't..
It can learn a few examples that happen to be linearily seperable,
like those you gave. It cannot learn the example I gave below, which
is not linearily seperable.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Justin Mason
2002-08-20 15:24:58 UTC
Permalink
Post by Malte S. Stretz
Post by Justin Mason
argh. Yes, I had to delete some mass-check outputs from some people
using VERY old test files, as well... must have missed one. :(
Has this much impact on the outcome, what do you think? Maybe a script might
be usful which compares the logs to the current ruleset and barfs if it
found a rule there some which don't exist (anymore)...
yep, it does. I've been outright deleting logs that contain old test
names. future mass-checks will include a date/time/version header too.

--j.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Bart Schaefer
2002-08-20 15:39:01 UTC
Permalink
Post by Justin Mason
(I plan to either comment, or move the tests to an 'attic' file in CVS
btw, so rehabilitators who think they can up hitrates after the 2.40
release are welcome.)
I presume it's too late at this point to suggest new tests. I've got a
couple in my local.cf that don't appear in very much spam (21 out of 1090
in the last 7 weeks) but were the deciding factor (pushed score over 5)
in 4 of those cases, and contributed 30% of the score in 4 more borderline
cases. Of course, they're scored by hand at the moment, so they might not
rank as high once put through the GA ...
Post by Justin Mason
comments? protests? ;)
0.534 0.593 0.499 0.54 -1.00 FWD_MSG
0.111 0.099 0.118 0.46 -1.00 BALANCE_FOR_LONG_40K
In this case (negative scores) I presume you mean they're being dropped
due to FNs?
Post by Justin Mason
0.000 0.000 0.000 0.00 4.00 BUGGY_CGI_DE
0.000 0.000 0.000 0.00 1.00 BUGGY_CGI_PT_2
0.000 0.000 0.000 0.00 1.00 BUGGY_CGI_ES
0.000 0.000 0.000 0.00 4.00 BUGGY_CGI_PT
0.000 0.000 0.000 0.00 1.00 REMOVE_ES_07
0.000 0.000 0.000 0.00 1.00 REMOVE_ES_06
0.000 0.000 0.000 0.00 1.00 REMOVE_ES_05
0.000 0.000 0.001 0.00 1.00 REMOVE_ES_02
0.000 0.000 0.001 0.00 1.00 SUBSCRIBE_ES_01
0.000 0.000 0.000 0.00 4.00 BUGGY_CGI_DE_2
Are you sure those aren't lousy hit rates just because non-English
languages are under-represented in the corpus?



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-20 17:53:33 UTC
Permalink
Justin, are you sure those are *real* FPs, not accidental FPs?
WWW_REMOVEYOU_COM for instance...

Also, among these rules are ones I think are really pretty
good -- I'd like to leave them in there, run the GA to rescore
against all the submitted corpus logs, then release 2.40 with
all the rules. Then do the traditional
sit-back-and-wait-for-the-screaming, and modify things
(judiciously) in a 2.41 release if necessary. I have great
faith after taking this approach with 2.30 that the GA does a
very very good job most of the time. One thing I do when
running the GA is freeze scores for anything where there's no
(or few) examples in the corpus. Basically, this means:

make freqs
sort -rn freqs

Then copy the names of the rules with no hits in the full corpus
into the unmutated_test thing at the top of logs-to-c, but it
looks like you've changed that code now...

I'm really not in favor of junking most of these tests.

C
Post by Justin Mason
0.008 0.014 0.004 0.77 1.00 FREE_HOSTING
0.035 0.048 0.027 0.64 1.56 DONT_DELETE
0.307 0.301 0.311 0.49 -1.00 SUBJECT_HAS_DATE
0.005 0.010 0.003 0.78 2.36 WWW_REMOVEYOU_COM
0.489 0.466 0.503 0.48 1.71 SPAM_REDIRECTOR
0.062 0.078 0.052 0.60 1.76 PLEASE_READ
0.045 0.059 0.037 0.61 1.00 SUSPICIOUS_CC_RECIPS
0.002 0.003 0.001 0.87 2.10 LONG_NUMERIC_HTTP_ADDR
0.005 0.009 0.003 0.77 1.00 INCREASE_TRAFFIC
0.060 0.073 0.052 0.58 0.57 CASINO
0.534 0.593 0.499 0.54 -1.00 FWD_MSG
0.111 0.099 0.118 0.46 -1.00 BALANCE_FOR_LONG_40K
0.038 0.048 0.031 0.61 1.00 BIG_BUCKS
0.000 0.001 0.000 1.00 1.00 GREEN_EXCUSE_2
0.000 0.001 0.000 1.00 1.00 GREEN_EXCUSE_1
0.000 0.001 0.000 1.00 1.00 CLICK_TO_REMOVE_3
0.000 0.001 0.000 1.00 1.00 PSYCHIC
0.000 0.001 0.000 1.00 4.20 YR_MEMBERSHIP_EXCH
0.000 0.001 0.000 1.00 1.00 MYCASINOBUILDER
0.000 0.001 0.000 1.00 1.00 INCREDIBLE
0.000 0.001 0.000 1.00 1.00 THIS_AINT_JUNK
0.002 0.004 0.001 0.81 1.00 MURKOWSKI_CRUFT
0.119 0.124 0.116 0.52 1.00 PORN_11
0.094 0.100 0.090 0.53 1.00 KNOWN_BAD_DIALUPS
0.048 0.056 0.044 0.56 1.00 MIME_BOUND_DIGITS_6
0.002 0.001 0.003 0.22 -1.00 TRACK_NUMBER
0.006 0.009 0.004 0.67 1.00 SIGNIFICANT
0.003 0.004 0.002 0.73 1.00 PORN_1
0.004 0.006 0.003 0.70 1.00 MONSTERHUT
0.067 0.067 0.066 0.50 1.00 FROM_NAME_EQ_FROM_ADDR
0.553 0.713 0.457 0.61 -1.00 BALANCE_FOR_LONG_20K
0.243 0.202 0.268 0.43 1.00 SUBJ_HAS_Q_MARK
0.315 0.253 0.352 0.42 1.00 PORN_3
0.264 0.217 0.292 0.43 1.00 PORN_10
0.004 0.006 0.003 0.66 2.00 RATWARE_YMR
0.018 0.021 0.017 0.55 1.00 RATWARE
0.002 0.003 0.002 0.69 1.00 X_ANTIABUSE
0.002 0.003 0.001 0.71 2.00 RATWARE_ANSMTP
0.002 0.003 0.001 0.71 1.00 CHANGE_TERMS
0.119 0.103 0.128 0.44 1.00 PORN_14
1.422 0.829 1.780 0.32 1.00 DOUBLE_CAPSWORD
0.023 0.022 0.023 0.49 1.00 FREE_TICKETS
1.297 1.989 0.878 0.69 -0.30 OUTLOOK_FW_MSG
0.002 0.003 0.002 0.62 1.00 SLASH_PRICE
0.015 0.015 0.015 0.50 5.00 FORGED_EBAY_RCVD
9.021 15.707 4.979 0.76 -1.00 USER_AGENT_OE
3.676 1.786 4.819 0.27 1.11 TO_MALFORMED
4.658 2.188 6.151 0.26 1.00 TO_BE_REMOVED_REPLY
0.215 0.136 0.262 0.34 1.00 T_FREE_WEBSITE
0.537 0.303 0.679 0.31 3.67 FROM_MALFORMED
0.806 0.429 1.034 0.29 -0.97 MIME_NULL_BLOCK
0.002 0.003 0.002 0.55 2.00 RATWARE_CSMTP
1.134 0.567 1.476 0.28 1.00 X_NOT_PRESENT
0.003 0.003 0.003 0.52 1.00 PRICES_WONT_LAST
0.275 0.160 0.345 0.32 1.00 HTML_COMMENT_UNIQUE_ID
0.140 0.203 0.102 0.67 -1.00 FAILURE_NOTICE_2
0.456 0.247 0.582 0.30 1.00 NIGERIAN_SCAM_12
0.149 0.221 0.105 0.68 -1.00 FAILURE_NOTICE_1
0.278 0.155 0.352 0.31 1.00 FROM_AND_TO_SAME
1.097 0.513 1.450 0.26 1.00 FROM_NAME_NO_SPACES
0.031 0.041 0.024 0.63 -1.00 T_NASD_FINANCIAL
0.041 0.057 0.031 0.65 -1.00 PRIVACY_STATEMENT
0.004 0.003 0.005 0.42 1.00 INCREASE_SALES
1.282 2.342 0.642 0.78 -0.10 USER_AGENT_AOL
1.741 0.606 2.427 0.20 0.48 TO_LOCALPART_EQ_REAL
0.001 0.001 0.001 0.45 1.00 REMOVE_ES_03
1.497 0.451 2.130 0.17 1.00 SUSPECT_LIST_HEADERS
0.034 0.015 0.046 0.24 -2.50 FROM_AND_TO_SAME_4
9.272 11.011 8.222 0.57 0.38 SUPERLONG_LINE
0.341 0.681 0.136 0.83 -1.00 MAILER_DAEMON
0.033 0.013 0.045 0.22 1.00 URI_IS_POUND
0.008 0.003 0.011 0.23 0.50 OUTLOOK_UNDISC_RECIPS
0.820 1.744 0.262 0.87 -1.00 USER_AGENT_THEBAT
1.061 0.134 1.621 0.08 1.00 SUBJ_ENDS_IN_Q_MARK
2.961 0.200 4.630 0.04 4.26 FROM_MISSING
0.046 0.002 0.073 0.02 1.00 SIGNATURE_DELIM
0.008 0.003 0.011 0.18 1.00 REAL_THING
0.000 0.000 0.000 0.00 1.00 SELECTED
0.004 0.000 0.006 0.00 1.00 NIGERIAN_SCAM_11
0.000 0.000 0.001 0.00 1.00 EXCUSE_ES_02
0.000 0.000 0.000 0.00 1.00 WEALTH
0.000 0.000 0.000 0.00 1.00 EXCUSE_ES_03
0.000 0.000 0.000 0.00 1.00 POPLAUNCH
0.000 0.000 0.000 0.00 1.00 BREAKTHROUGH
0.000 0.000 0.000 0.00 1.00 RATWARE_38
0.000 0.000 0.000 0.00 1.00 RATWARE_37
0.000 0.000 0.000 0.00 1.00 RATWARE_36
0.000 0.000 0.000 0.00 1.00 VIGORA
0.000 0.000 0.000 0.00 1.00 RATWARE_35
0.000 0.000 0.000 0.00 1.00 IRS
0.000 0.000 0.000 0.00 1.00 RATWARE_34
0.000 0.000 0.000 0.00 1.00 CLICKSFORMONEY_NET
0.000 0.000 0.000 0.00 1.00 RATWARE_33
0.003 0.009 0.000 1.00 0.00 TO_INVESTORS
0.000 0.000 0.000 0.00 1.00 RATWARE_32
0.000 0.000 0.000 0.00 1.00 RATWARE_31
0.000 0.000 0.000 0.00 4.25 SHORT_RECEIVED_LINE
0.000 0.000 0.000 0.00 1.00 SPY_ON_FRIENDS
0.000 0.000 0.000 0.00 1.00 RATWARE_30
0.000 0.000 0.000 0.00 -1.00 MSN_FOOTER2
0.000 0.000 0.000 0.00 1.00 FILTERED_BY_WORLDREMOVE
0.000 0.000 0.000 0.00 1.00 NIGERIAN_SCAM
0.000 0.000 0.000 0.00 1.00 EU_200_32_CE
0.000 0.000 0.000 0.00 4.00 BUGGY_CGI_DE
0.000 0.000 0.000 0.00 1.00 BUGGY_CGI_PT_2
0.000 0.000 0.000 0.00 1.00 RATWARE_27
0.000 0.000 0.000 0.00 1.00 RATWARE_26
0.000 0.000 0.000 0.00 1.00 RATWARE_25
0.000 0.000 0.000 0.00 1.00 RATWARE_24
0.000 0.000 0.000 0.00 4.00 STAINLESS_STEEL
0.000 0.000 0.000 0.00 1.00 RATWARE_21
0.000 0.000 0.000 0.00 1.00 RATWARE_20
0.000 0.000 0.000 0.00 1.00 WWW_NETSITESFORFREE_NET
0.000 0.000 0.000 0.00 1.00 BACKED_BY
0.000 0.000 0.000 0.00 1.00 RATWARE_19
0.000 0.000 0.000 0.00 1.00 RATWARE_18
0.000 0.000 0.000 0.00 1.00 RATWARE_16
0.000 0.000 0.000 0.00 1.00 FREEWEBHOSTINGCENTRAL
0.000 0.000 0.000 0.00 1.00 RATWARE_15
0.000 0.000 0.000 0.00 1.00 RATWARE_14
0.000 0.000 0.000 0.00 1.00 PRINT_OUT_AND_FAX
0.000 0.000 0.000 0.00 1.00 RATWARE_13
0.000 0.000 0.000 0.00 1.00 RATWARE_12
0.000 0.000 0.000 0.00 1.00 BUGGY_CGI_ES
0.000 0.000 0.000 0.00 1.00 RATWARE_10
0.000 0.000 0.000 0.00 1.00 RATWARE_11
0.000 0.000 0.000 0.00 1.00 T_SUBJ_ISO885915
0.000 0.000 0.000 0.00 2.00 RATWARE_MAMA
0.000 0.000 0.000 0.00 4.00 BUGGY_CGI_DE_3
0.000 0.000 0.000 0.00 1.00 WWW_DIRECTFORCEMARKETING_COM
0.000 0.000 0.000 0.00 1.00 FREEWEBCO_NET_URL
0.001 0.000 0.001 0.00 1.80 EU_EMAIL_OPTOUT
0.000 0.000 0.000 0.00 2.67 CORRUPT_MSGID
0.000 0.000 0.000 0.00 4.00 BUGGY_CGI_PT
0.000 0.000 0.000 0.00 1.00 FREEMEGS_URL
0.029 0.078 0.000 1.00 -1.00 USER_AGENT_GNUS_XM
0.000 0.000 0.000 0.00 1.00 EXCUSE_8
0.000 0.000 0.000 0.00 1.00 RATWARE_40
0.000 0.000 0.000 0.00 1.00 ITS_EFFECTIVE
0.000 0.000 0.000 0.00 1.00 SHOES_GUY
0.000 0.000 0.000 0.00 1.00 REMOVE_ES_07
0.000 0.000 0.000 0.00 1.00 REMOVE_ES_06
0.000 0.000 0.000 0.00 1.00 WEB4PORNO_URL
0.000 0.000 0.000 0.00 1.00 RATWARE_00
0.000 0.000 0.000 0.00 1.00 YELLOWSUN
0.001 0.000 0.001 0.00 2.00 RATWARE_NETMAILER
0.000 0.000 0.000 0.00 1.00 RATWARE_01
0.000 0.000 0.000 0.00 1.00 BRAND_NEW_PAGER
0.000 0.000 0.000 0.00 3.46 FAKED_UNDISC_RECIPS_AT
0.002 0.005 0.000 1.00 -1.00 MAILBITS_EMAIL
0.000 0.000 0.000 0.00 1.00 REMOVE_ES_05
0.000 0.000 0.000 0.00 1.00 RATWARE_02
0.000 0.000 0.000 0.00 1.00 RATWARE_05
0.000 0.000 0.000 0.00 1.00 REMOVE_ES_04
0.000 0.000 0.000 0.00 1.00 NIGERIAN_SCAM_8
0.000 0.000 0.000 0.00 1.00 RATWARE_07
0.000 0.000 0.001 0.00 1.00 REMOVE_ES_02
0.000 0.000 0.000 0.00 1.00 E_WEBHOSTCENTRAL_URL
0.000 0.000 0.000 0.00 2.00 RATWARE_COGNI
0.000 0.000 0.000 0.00 1.00 NO_SELLING
0.000 0.000 0.000 0.00 2.00 RATWARE_HASH_1
0.000 0.000 0.000 0.00 1.00 RATWARE_45
0.000 0.000 0.000 0.00 -1.00 USER_AGENT_GNUS_UA
0.000 0.000 0.001 0.00 1.00 SUBSCRIBE_ES_01
0.000 0.000 0.000 0.00 1.00 RATWARE_43
0.000 0.000 0.000 0.00 1.00 RATWARE_42
0.000 0.000 0.000 0.00 1.00 RATWARE_41
0.000 0.000 0.000 0.00 1.00 CLICK_TO_REMOVE_MAILTO
0.001 0.000 0.001 0.00 1.00 BONUS_PAYMENT
0.000 0.000 0.000 0.00 1.00 ANOTHER_NET_AD
0.000 0.000 0.000 0.00 1.00 LIFE_INSURANCE
0.000 0.000 0.000 0.00 4.00 BUGGY_CGI_DE_2
0.000 0.000 0.000 0.00 1.00 T_FROM_ISO885915
0.000 0.000 0.000 0.00 1.00 NO_SPENDING
0.002 0.000 0.003 0.00 1.00 T_MONTH_TRIAL
0.000 0.000 0.001 0.00 1.00 T_FREE_HOSTING
0.004 0.000 0.006 0.00 1.00 T_MEMBER_2
0.000 0.000 0.001 0.00 1.00 T_SAVINGS
0.000 0.000 0.000 0.00 3.90 LASER_PRINTER
0.001 0.000 0.002 0.00 1.00 A_HREF_TO_REMOVE
0.000 0.000 0.001 0.00 1.00 T_FREE_INSTALL
0.001 0.000 0.002 0.00 1.00 T_SUBJ_FREE_CAP
0.001 0.000 0.002 0.00 1.00 PORN_8
0.018 0.000 0.028 0.00 1.00 T_TRADEMARK
0.042 0.000 0.067 0.00 1.00 UNIFIED_PATCH
0.004 0.000 0.006 0.00 1.00 T_USER_4U2
0.000 0.000 0.001 0.00 1.00 T_FREE_ACCESS
0.001 0.000 0.001 0.00 1.00 A_HREF_TO_UNSUB
0.008 0.000 0.013 0.00 1.00 T_DOMAIN_4U2
0.000 0.000 0.001 0.00 1.00 T_UNLIMITED
--j.
--
'Justin Mason' => { url => http://jmason.org/ , blog =>
http://taint.org/ }
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Daniel Quinlan
2002-08-20 19:24:22 UTC
Permalink
Post by Justin Mason
Here we go.....
Ummm, I must have missed the announcement of when and where to upload
mass-check results. I also must have missed the announcement of the
rules freeze prior to removing tests.

Where has the process gone?
Post by Justin Mason
comments? protests? ;)
Yes, both... *sigh*

- Dan


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Matthew Cline
2002-08-20 23:38:18 UTC
Permalink
Post by Daniel Quinlan
Post by Justin Mason
Here we go.....
Ummm, I must have missed the announcement of when and where to upload
mass-check results. I also must have missed the announcement of the
rules freeze prior to removing tests.
<AOL>Me too.</AOL>
--
Give a man a match, and he'll be warm for a minute, but set him on
fire, and he'll be warm for the rest of his life.

ICQ: 132152059 | Advanced SPAM filtering software: http://spamassassin.org


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-21 01:46:56 UTC
Permalink
Ok folks... I think there's been some confusion, so I recommend
the following:

1. Use the CVS tag spamassassin_pre_2_4_0 which I just created
to generate your mass-checks
2. Submit the results via rsync in the usual way ASAP
3. Justin and I will cooperate on running the GA sometime around
midnight PST (GMT-0800) on thursday/friday

Any good data that's in the rsync corpus by that time will be
used to feed the GA. That will give everyone about 48 hours
from now to generate the logs and submit them, plus it'll give
Justin and me about 48 hours to tinker with the GA (and
hopefully get it compiling again on my machine grumble
grumble). That should then yield a 2.40 release sometime on
friday, possibly saturday if we need to do multiple GA runs to
tweak things properly.

C

PS Volunteering Justin here, since I suspect he's probably fast
asleep right now.
Post by Matthew Cline
Post by Daniel Quinlan
Post by Justin Mason
Here we go.....
Ummm, I must have missed the announcement of when and where to upload
mass-check results. I also must have missed the announcement of the
rules freeze prior to removing tests.
<AOL>Me too.</AOL>
--
Give a man a match, and he'll be warm for a minute, but set him on
fire, and he'll be warm for the rest of his life.
http://spamassassin.org
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Matthew Cline
2002-08-21 02:00:49 UTC
Permalink
Post by Craig R.Hughes
Ok folks... I think there's been some confusion, so I recommend
1. Use the CVS tag spamassassin_pre_2_4_0 which I just created
to generate your mass-checks
Doing "perl Makefile.PL" for that tag, I get:

Checking if your kit is complete...
Warning: the following files are missing in your kit:
INSTALL
Please inform the author.
Writing Makefile for Mail::SpamAssassin
--
Give a man a match, and he'll be warm for a minute, but set him on
fire, and he'll be warm for the rest of his life.

ICQ: 132152059 | Advanced SPAM filtering software: http://spamassassin.org


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-21 06:23:14 UTC
Permalink
Justin just added that to the MANIFEST, but I guess forgot to
check it in. Shouldn't matter as far as running mass-check
though. Just need to make sure we add it in before final
release.

C
Post by Matthew Cline
Post by Craig R.Hughes
Ok folks... I think there's been some confusion, so I recommend
1. Use the CVS tag spamassassin_pre_2_4_0 which I just created
to generate your mass-checks
Checking if your kit is complete...
INSTALL
Please inform the author.
Writing Makefile for Mail::SpamAssassin
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-21 01:24:39 UTC
Permalink
I missed it too, but I see a whopping 583 unread mails in my
sa-dev mailbox, so I'm guessing it's in there somewhere :)

I noticed there was no tag (at least I couldn't see one) as to
what the "legit" CVS state is to run mass-check against, so I
ran mine against the current tip (as of this morning).

C
Post by Daniel Quinlan
Post by Justin Mason
Here we go.....
Ummm, I must have missed the announcement of when and where to upload
mass-check results. I also must have missed the announcement of the
rules freeze prior to removing tests.
Where has the process gone?
Post by Justin Mason
comments? protests? ;)
Yes, both... *sigh*
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Dan Kohn
2002-08-21 06:19:55 UTC
Permalink
Matt, I'm the chairman and co-founder of Habeas, and I am, in general,
opposed to software patents. However, I can reassure you that Habeas
does not have any software patents, because we don't have any software!
Habeas is the first startup that is not doing hardware or software
development, but is instead practicing "legal engineering".

The nature of our patent is about putting a warrant mark in RFC 2822
X-headers (or in the body) to allow senders to warrant their mail as
*not-spam*, and then to use copyright and trademark infringement to
enable Habeas to enforce that warranty.

More than you would really want to know about Habeas is at
<http://www.habeas.com/faq/>. However, I will add your software patent
question.

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>


-----Original Message-----
From: Matthew Cline [mailto:***@nightrealms.com]
Sent: Tuesday, August 20, 2002 8:48 PM
To: SpamAssassin Talk ML
Subject: Re: [SAtalk] SA In The News
Post by Harold Hallikainen
http://www.wired.com/news/technology/0,1282,54645,00.html
Summary: a company will offer short snippets of original, copyrighted
and
trademarked text that can be inserted into email message headers, and
email
filters can recognize this as a "not-spam" indicator. Any spammers who
use
the text will be sued for copyright and trademark infringement.

The company's site is http://www.habeas.com/, and the
copyrighted/trademarked
text is this haiku:

winter into spring,
brightly anticipated,
like Habeas SWE(TM)

The "Services" page lists "complimentary" filtering services as
"SpamAssassin
and BrightMail". Heh, we get top billing.

The negative thing about their system is that it's "patent pending", and
I
don't exactly like the idea of helping a business that uses software
patents,
but if it helps to reduce false positives...
--
Give a man a match, and he'll be warm for a minute, but set him on fire,
and he'll be warm for the rest of his life.

ICQ: 132152059 | Advanced SPAM filtering software:
http://spamassassin.org


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old cell
phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list Spamassassin-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Matthew Cline
2002-08-21 06:41:43 UTC
Permalink
Post by Dan Kohn
Matt, I'm the chairman and co-founder of Habeas, and I am, in general,
opposed to software patents. However, I can reassure you that Habeas
does not have any software patents, because we don't have any software!
Habeas is the first startup that is not doing hardware or software
development, but is instead practicing "legal engineering".
OK, I guess the patent you're applying for isn't a software patent. However,
it does seem similar to a software patent, as it's patetning a
concept/idea/method, and many of the agrumnets against software patents could
(I think) also be used against this patent.
--
Give a man a match, and he'll be warm for a minute, but set him on
fire, and he'll be warm for the rest of his life.

ICQ: 132152059 | Advanced SPAM filtering software: http://spamassassin.org


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Ross Vandegrift
2002-08-21 14:59:36 UTC
Permalink
[Gratuitously off-topic]
Post by Matthew Cline
Post by Dan Kohn
Matt, I'm the chairman and co-founder of Habeas, and I am, in general,
opposed to software patents. However, I can reassure you that Habeas
does not have any software patents, because we don't have any software!
Habeas is the first startup that is not doing hardware or software
development, but is instead practicing "legal engineering".
OK, I guess the patent you're applying for isn't a software patent. However,
it does seem similar to a software patent, as it's patetning a
concept/idea/method, and many of the agrumnets against software patents could
(I think) also be used against this patent.
I was at a party last week at one of my best friends' house. His aunt
was throwing the party for some old friends of hers. I was a bit
surprised to learn that most of the group were patent examiners.

Being that I have an interest in legal issues, I struck up a
conversation with one of them. Turns out they're all into checmical
patents. They were surprised to hear a college kid was "casually
interested" in patents, and when I mentioned I was intereted (and
against) software and mathematics patents, I was greeted with unanimous
support.

What makes this bit relevant though, is the what one woman began telling
me. She said though patents for software, math, and business models
have been granted, sections of the department have started gathering up
their materials for a big reevaluation. Seems someone in the patent
office there's stirrings of throwing them all away.

I was pretty shocked, but she made it quite clear that she and many of
the people she worked with think they're complete and utter bunk.
--
Ross Vandegrift
***@willow.seitz.com

A Pope has a Water Cannon. It is a Water Cannon.
He fires Holy-Water from it. It is a Holy-Water Cannon.
He Blesses it. It is a Holy Holy-Water Cannon.
He Blesses the Hell out of it. It is a Wholly Holy Holy-Water Cannon.
He has it pierced. It is a Holey Wholly Holy Holy-Water Cannon.
Batman and Robin arrive. He shoots them.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Ed Greenberg
2002-08-22 03:43:15 UTC
Permalink
Post by Dan Kohn
The nature of our patent is about putting a warrant mark in RFC 2822
X-headers (or in the body) to allow senders to warrant their mail as
*not-spam*, and then to use copyright and trademark infringement to
enable Habeas to enforce that warranty.
So I have what I think is a good idea that would not infringe on Mr. Kohn's
patent, but I am afraid to post it since it would then become part of Mr.
Kohn's business model. I'd give it to the SpamAssassin folks first, after
assuring that it could be shared by all other interested Spam Filtering
products -- wide use being a benefit to such ideas.

Question: Why should open-source spam filtering folks want to support an
email filtering token whose use is not only generating revenue for a third
party but also whose use is forbidden to anybody but that third party.



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Tony L. Svanstrom
2002-08-22 13:23:43 UTC
Permalink
Post by Ed Greenberg
Question: Why should open-source spam filtering folks want to support an
email filtering token whose use is not only generating revenue for a third
party but also whose use is forbidden to anybody but that third party.
Nepotism? ;-)


/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Justin Shore
2002-08-22 14:45:16 UTC
Permalink
Post by Dan Kohn
The nature of our patent is about putting a warrant mark in RFC 2822
X-headers (or in the body) to allow senders to warrant their mail as
*not-spam*, and then to use copyright and trademark infringement to
enable Habeas to enforce that warranty.
Why is it that I envision Verisign doing something like this. Yeah,
like I want Verisign to sign all my spam, er, mail for me. Pay them
enough and they'd probably sign spam too. ;)

J
--
--
Justin Shore, ES-SS ES-SSR Pittsburg State University
Network & Systems Manager http://www.pittstate.edu/ois/


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
rODbegbie
2002-08-22 17:16:53 UTC
Permalink
Post by Justin Shore
Why is it that I envision Verisign doing something like this. Yeah,
like I want Verisign to sign all my spam, er, mail for me. Pay them
enough and they'd probably sign spam too. ;)
You mean like this? http://www.postiva.com/

Seems to have become somewhat stillborn since the big announcement in
January.

rOD.


--
"I'm as mad as hell and I'm not going to take this anymore!"

Doing the blogging thang again at http://www.groovymother.com/ <<



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-22 17:44:36 UTC
Permalink
Because it might help to reduce spam levels? That is the
purpose of SpamAssassin. I don't think there ought to be
political decisions about which spam filtering ideas should or
should not be included, I think rather that it makes sense to
use anything which holds out a good chance of helping out. That
Habeas tags happen to be commercial is not really relevant to
SpamAssassin, IMO.

C
Post by Ed Greenberg
Question: Why should open-source spam filtering folks want to
support an email filtering token whose use is not only
generating revenue for a third party but also whose use is
forbidden to anybody but that third party.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Justin Mason
2002-08-21 12:35:45 UTC
Permalink
Post by Tony L. Svanstrom
Yes, saying that they'll sue anyone using it is easy, but the sad fact
is th at even if they're Bill Gates they can't reach a lot of the
spammers out there; not to mention that it isn't even sure thing that
their text is/could be copyrighted in every part of the world.
well, that's one good thing that came out of GATT and WIPO, along with
pushing the US software patent agenda and all that crap. It *is*
copyrighted worldwide, afaik, apart from a few bizarre places like
Azerbaijan etc.

--j.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Justin Mason
2002-08-21 14:12:47 UTC
Permalink
Of course, at least the non-private part of it. How do you folks handle
this with english corpi? In the main CVS rep on sf.net? Or on private
servers?
There's so much English spam that everybody's got enough on his own ;-)
yep!

But private servers (well, public with passwords) is best, as typically
you may not want to give away info about your spam-traps etc.

--j.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Justin Mason
2002-08-22 17:24:23 UTC
Permalink
OK -- everyone who's run a mass-check using the rules from the
spamassassin_pre_2_4_0 label. Are your results ready? If so, rsync
them up so we can do some freq analysis.
Ummm... I assume I meant spamassassin_pre_2_4_0b?
argh! yes ;)
nonspam-msquadrat.log
I did this one with 2_4_0. But I can run another mass-check with b if you
want...
"b" is better, 2_4_0 did not have the tests I'd commented, re-enabled.
Sooner is better BTW, I plan to

1. figure out the freqs tonight, suggest what tests to drop
2. wait for comments
3. drop tests that nobody cares about tomorrow
4. sed out the dropped tests from the mass-check logs
5. kick off the GA

BTW I'll be away this weekend at Linuxbierwanderung, so Craig, you might
have to run the GA. ;)

--j.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-22 18:19:57 UTC
Permalink
I think with the confusion and all, we should give these folks
until the end of the day to get fixed logs uploaded :)

Justin, on a related note -- I didn't include the spamtrap stuff
from the corpus directory on your server in my logs, but that
should definitely be in there -- don't know if you'd included it
in yours or not. I think it's from Kelsey's mailtraps...

C
Files that are out of date (as far as I can tell -- they have no date
nonspam-danielp.log
nonspam-danielr.log
nonspam-duncan.log
nonspam-james.log
nonspam-laager.log
nonspam-messagelabs.log
nonspam-msquadrat.log
nonspam-olivier.log
nonspam-oliviern.log
nonspam-quinlan.log
nonspam-rodbegbie.log
nonspam-sean.log
nonspam-theo.log
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-22 18:28:13 UTC
Permalink
Post by Justin Mason
I plan to
1. figure out the freqs tonight, suggest what tests to drop
2. wait for comments
3. drop tests that nobody cares about tomorrow
4. sed out the dropped tests from the mass-check logs
This step is unneccesary -- unless you've changed the scripts
much, any test in the logs which aren't in the rules files will
just be ignored I think. You do seem to have changed the
logs-to-c script and removed the bit where you could specify
immutable tests at the top -- I took a brief glance through the
code and couldn't fully make out how it had changed. I think we
want to be able to specify immutable test scores though in there
somewhere -- or is that now handled by the tflags stuff? For
the last couple releases, any test which occurred infrequently
(by thumb-in-the-wind subjective criteria) I set to have
immutable scores, as well as a handful of other rules.
Post by Justin Mason
5. kick off the GA
BTW I'll be away this weekend at Linuxbierwanderung, so Craig,
you might
have to run the GA. ;)
Shouldn't be a problem. Assuming I can get the darned thing to
compile :)

C



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Daniel Rogers
2002-08-22 18:35:27 UTC
Permalink
OK -- everyone who's run a mass-check using the rules from the
spamassassin_pre_2_4_0 label. Are your results ready? If so, rsync
them up so we can do some freq analysis.
nonspam-danielr.log
I thought I uploaded mine last night, but I was rerunning with the 2_4_0b
version, so I've uploaded them again.

Dan.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Robert L Mathews
2002-08-22 19:04:34 UTC
Permalink
Post by Ed Greenberg
Question: Why should open-source spam filtering folks want to support an
email filtering token whose use is not only generating revenue for a third
party but also whose use is forbidden to anybody but that third party.
Because it may work out as a good compensation rule -- e.g. better than
just searching the Subject line for "order status" or similar? ;)
Whether or not Habeas takes off, I don't care. But for it to have a
chance, it needs support from filters -- and if it *does* work, then it
gives us a good thing to filter on.
Ditto for Pyzor, Razor, the DNSBLs, etc. They all give us a helping hand
in determining spam -- and the more we support, the better. And if they
*stop* working effectively, it's just as easy to comment it out, halve its
score, give it a score of 0, etc.
I agree that the concept may be good and might work well, but the
objection isn't to the concept itself.

The objection is to the fact that Habeas is trying to patent the concept.
This means that if the concept itself turns out to be a good one, but
Habeas fails to market the product successfully (or changes their policy
and starts charging ISPs -- despite their promise, it's not legally
enforceable if, for example, they file for bankruptcy and someone else
buys the patent), nobody else will be able to use the same concept.

The fact that Habeas is copyrighting and trademarking their mark is fine;
that's what will make their implementation of the idea work, if it does.
But their attempt to patent it is being done for a different reason: it
adds no anti-spam protection, but merely gives them a legal monopoly on
implementing the concept.

If I were Habeas, I would attempt to patent it, too, because doing so
clearly benefits their company. But that makes the idea proprietary,
which seems to be contrary to the goals of open source software: it means
that other people/products could not implement the same concept if they
wished to do so.

The discussion of this seems to be taking place on two levels: people are
objecting to the idea of the patent, and others are replying with "well,
we'll see if it works and remove it if not". Whether the concept works or
not is not the issue; the question is whether it's desirable to give a
boost to a company that is trying to make an anti-spam technique
proprietary. (And maybe it is okay; perhaps Habeas is investing resources
to implement an idea that nobody else would find financially worth doing
unless they could get patent protection. It just bears discussing.)

------------------------------------
Robert L Mathews, Tiger Technologies



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Michael Moncur
2002-08-22 20:01:08 UTC
Permalink
Post by Robert L Mathews
The discussion of this seems to be taking place on two levels: people are
objecting to the idea of the patent, and others are replying with "well,
we'll see if it works and remove it if not". Whether the concept works or
not is not the issue; the question is whether it's desirable to give a
boost to a company that is trying to make an anti-spam technique
proprietary. (And maybe it is okay; perhaps Habeas is investing resources
to implement an idea that nobody else would find financially worth doing
unless they could get patent protection. It just bears discussing.)
As far as I'm concerned, whether it works or not is the *only* issue. I've
always liked SpamAssassin because its core motivation is to detect spam. It
doesn't have political motivations.

There are a wide range of political and legal issues here, certainly -- but
if I had my way, SpamAssassin's perspective would be simply this: "Hmm,
there's a bad haiku showing up in lots of messages lately, and most of them
are nonspam. Let's make a rule for it and see how it works."

--
Michael Moncur mgm at starlingtech.com http://www.starlingtech.com/
"It is better to be quotable than to be honest." --Tom Stoppard



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-22 21:09:44 UTC
Permalink
I don't understand why patents in general are necessarily
contrary to the goals of open source software as a whole. They
probably are contrary to the goals of the FSF, but then so is
copyright (except when used as part of copyleft). I personally
don't have a problem with people patenting things which truly
are novel concepts. What Dan came up with here, the idea of
inserting copyright text into an RFC822 header field to prevent
forgery, is I think quite novel. Sure it seems like something
anyone might have come up with *after* the idea's out there.
Note that this is patent-pending, so if you *do* have any
examples of prior art, you can probably draw them to the
attention of the patent examiner.

C
Post by Robert L Mathews
If I were Habeas, I would attempt to patent it, too, because doing so
clearly benefits their company. But that makes the idea proprietary,
which seems to be contrary to the goals of open source
software: it means
that other people/products could not implement the same concept if they
wished to do so.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Tony L. Svanstrom
2002-08-22 21:48:35 UTC
Permalink
Post by Craig R.Hughes
I don't understand why patents in general are necessarily
contrary to the goals of open source software as a whole.
"Hey, I've got an idea that will improve [something]; let's patent it so that
no one else can use it unless I say so."

And that's good for open source in exactly what way?


/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Russ Gilman-Hunt
2002-08-22 22:03:59 UTC
Permalink
I think you're talking about the way people use patents and not patents
themselves. You could also imagine that people would use patents as a way of
describing what came first- why bother reinventing a round wheel, if someone
already has one... just get ahold of it and make it better.

-Russ
Post by Tony L. Svanstrom
Post by Craig R.Hughes
I don't understand why patents in general are necessarily
contrary to the goals of open source software as a whole.
"Hey, I've got an idea that will improve [something]; let's patent it so
that no one else can use it unless I say so."
And that's good for open source in exactly what way?
/Tony
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Tony L. Svanstrom
2002-08-22 22:30:34 UTC
Permalink
Post by Russ Gilman-Hunt
Post by Tony L. Svanstrom
Post by Craig R.Hughes
I don't understand why patents in general are necessarily
contrary to the goals of open source software as a whole.
"Hey, I've got an idea that will improve [something]; let's patent it so
that no one else can use it unless I say so."
And that's good for open source in exactly what way?
I think you're talking about the way people use patents and not patents
themselves. You could also imagine that people would use patents as a way of
describing what came first- why bother reinventing a round wheel, if someone
already has one... just get ahold of it and make it better.
Since we're living in real life and not imagine-land, yes, it's the way people
use them that I have a problem with; I'll change my opinion about how patents
are generally bad for open source as soon as you tell me how I'd go about
moving into imagine-land.


/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-22 22:46:09 UTC
Permalink
It's good for people in general because it provides an incentive
for people to innovate. Again, I should repeat that it's only
useful in situations where the thing being patented is actually:

1) Not already being done, and
2) Not obvious

Effectiveness of patent examiners and their ability to determine
(1) and (2) is not guaranteed, but that's what the courts are
for :)

C
Post by Tony L. Svanstrom
Post by Craig R.Hughes
I don't understand why patents in general are necessarily
contrary to the goals of open source software as a whole.
"Hey, I've got an idea that will improve [something]; let's
patent it so that
no one else can use it unless I say so."
And that's good for open source in exactly what way?
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Matt Sergeant
2002-08-23 09:08:27 UTC
Permalink
Post by Tony L. Svanstrom
Post by Craig R.Hughes
I don't understand why patents in general are necessarily
contrary to the goals of open source software as a whole.
"Hey, I've got an idea that will improve [something]; let's patent it so that
no one else can use it unless I say so."
That's one use of a patent. Another use is in case some predatorial
company attacks them based on their patents/money/lawyers/whatever. They
can be useful as a defense mechanism. They unfortunately also look good
to those with the money - directors and banks/vc's.

You don't have to use a patent predatorially. A zillion other people can
implement your ideas, and your patent doesn't lose any strength.

Matt.




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
John Rudd
2002-08-23 21:44:58 UTC
Permalink
Post by Tony L. Svanstrom
Post by Craig R.Hughes
I don't understand why patents in general are necessarily
contrary to the goals of open source software as a whole.
"Hey, I've got an idea that will improve [something]; let's patent it so that
no one else can use it unless I say so."
And that's good for open source in exactly what way?
It is entirely tangential to open source. Your beef is not with patents
vs open source, but patents vs free software.

For example, if I patent some code in my product, I could feel more
comfortable distributing the source to my customers because I have some
other mechanism to protect my IP than "obscurity and difficulty of
reverse compilation/engineering". I can instead make the source
available (and thus allow the customer to make customizations and such),
but restrict use of the code to those who have purchased a license
(which may be more or less effective depending on the market in
question). It makes me MORE likely to move toward open source if legal
(not sofware token) license restrictions alone are effective (Ex: a
software patent in the transmission computer of a car, where I as the
patent holder could write up the code and send it to toyota including my
code as an example of the algorythm, giving them the source so they can
customize it, and requiring that they pay me a license fee per car sold
that uses my patent and sue them if I find my alg in their prom, and
they haven't paid me).

The people for whom software patents are bad are the free software
community, who oppose any form of code-slavery. Don't confuse "Open
Source Software" with "Free Software". They're related, but distinct,
concepts.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Justin Mason
2002-08-22 20:56:11 UTC
Permalink
Post by Craig R.Hughes
I think with the confusion and all, we should give these folks
until the end of the day to get fixed logs uploaded :)
ok.
Post by Craig R.Hughes
Justin, on a related note -- I didn't include the spamtrap stuff
from the corpus directory on your server in my logs, but that
should definitely be in there -- don't know if you'd included it
in yours or not. I think it's from Kelsey's mailtraps...
Don't worry about that -- I have an *insane* quantity of spamtrap
data, mostly from Kelsey and a couple of other ISPs. It's got to
the stage where I can just scan the last month's mailboxes, and
get nearly 100,000 unique mails out of it :(

(BTW Craig could you mail me your AIM screen name? I've left
it on my mail in work and can't get near it, doh.)

--j.





-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Robert L Mathews
2002-08-22 23:09:41 UTC
Permalink
Post by Craig R.Hughes
I don't understand why patents in general are necessarily
contrary to the goals of open source software as a whole.
Because it prevents others from using the same idea. In my opinion, the
basic open source principle is roughly "I'm contributing this
intellectual property to the world so that others may use it for the
general betterment of the community."

Patenting is the opposite of that -- "I'm claiming a temporary monopoly
on this idea so that others may not use it."
Post by Craig R.Hughes
They
probably are contrary to the goals of the FSF, but then so is
copyright (except when used as part of copyleft).
Well, I can't speak for the FSF, but copyright in general isn't contrary
to the goals of open source. Copyright law allows the owner to include a
license allowing others to use and build on the work, making it "open".
So copyright doesn't necessarily lock out others.

I suppose that one could "open source" a patent, too, by granting anyone
a perpetual license to use the patented concept without restriction.
However, Habeas clearly isn't going to do that, so their patent would
explicitly prevent others from using the work and building on it --
making their property "proprietary", which seems pretty much the opposite
of "open" as intended in the phrase "open source".
Post by Craig R.Hughes
What Dan came up with here, the idea of
inserting copyright text into an RFC822 header field to prevent
forgery, is I think quite novel.
I'm not disputing that, and I'll point out that I didn't come out against
including Habeas support in SA -- I'd certainly like additional accuracy
improving methods to be included, all things being equal. (And I myself
own six copyrights registered with the copyright office and a registered
trademark, so I'm certainly not opposed in general to people protecting
their work and making money of it.)

I just said that some people seemed to be missing the point of the
objections, which is that a patent on the Habeas system would mean that
others may not use the idea. That's unlikely to be a problem as long as
Habeas "does the right thing", but I suspect people might feel they made
a mistake by helping the company establish a standard if they later
become wildly profitable (or go bankrupt) and sell the patent to an evil
company that starts charging $1,000 a year for an ISP license, for
example.

Or maybe not. What the hell do I know? :-)

------------------------------------
Robert L Mathews, Tiger Technologies



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Robert L Mathews
2002-08-23 00:00:08 UTC
Permalink
Post by Craig R.Hughes
It's good for people in general because it provides an incentive
for people to innovate.
Of course, and you can make the same argument about proprietary software.
Patent law can make ideas proprietary, just like copyright law can make
software proprietary if the author wishes.

"Proprietary" isn't by definition "bad", and the temporary legal monopoly
conferred by patents and copyrights is certainly a creative motivator
(although it's by no means the only one -- to take a narrow example, I've
seen a number of novel spam fighting ideas suggested by people who
didn't, to my knowledge, try to patent them).

But although it's not automatically "bad", patenting an idea to make it
proprietary *is* the opposite of the concept of "open" in "open source".
After all, Habeas doesn't *have* to patent their idea.

------------------------------------
Robert L Mathews, Tiger Technologies



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Dan Kohn
2002-08-23 07:13:39 UTC
Permalink
Good thing then that I'm a venture capitalist in addition to the
chairman and co-founder of Habeas.

Tony, I can hardly promise you that Habeas will never do anything bad in
the future. Obviously, if all of our team is run over by a bus and
Spamford Wallace takes over the company, things would be bad. But
that's true of any company. There's no way to make the following
promise legally binding, so instead we place it prominently on our
website and imply that you shouldn't take us seriously if we later go
back on our word:

http://www.habeas.com/faq/index.htm#1.2
1.2. Will you raise your prices?
Habeas may eventually raise the price of the business and commercial
bulk licenses, but we commit to always keeping the individual and ISP
licenses royalty-free.

I think the most we can ask is that as long as we act reasonably,
responsibly, and responsively, that we keep the support of the SA and
other anti-spam community. I very much appreciate the community giving
Habeas the chance to succeed.

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>


-----Original Message-----
From: Tony L. Svanstrom [mailto:***@svanstrom.com]
Sent: Wednesday, August 21, 2002 2:39 PM
To: Matthew Cline
Cc: SpamAssassin Talk ML
Subject: Re: [SAtalk] SA In The News
Post by Matthew Cline
Post by Tony L. Svanstrom
What'll happen when they start getting low on money, they start
hunting people down for using passwords in subject-lines to
whitelist people, or maybe Spam- Assassin and/or Deersoft must start
paying a fee for recognizing their lil haiku.
The nature of our patent is about putting a warrant mark in RFC 2822
X-headers (or in the body)
So no existing authentication method is covered by the patent.
Well, without reading it you don't know for sure, I mean, it isn't a
two-line dokument...

Anyways, I find the whole thing very interesting, but I think that it
will give some not-very-nice-people ideas and that something like this
could go very wrong when/if popular enough and the company behind it
runs out of money.


/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! # #
Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old cell
phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list Spamassassin-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Dan Kohn
2002-08-23 07:13:36 UTC
Permalink
I believe all patents refer to methods. I don't think I'll be able to
convince you to like patents, although they were guaranteed in Clause
I(8)(8) of the Constitution (along with copyright), so they are as
American as apple pie.

The good news is that unless you set out to copy Habeas' business plan,
you are extraordinarily unlikely to infringe our patent (once it's
issued). As Craig pointed out, we use patent infringement only against
anti-spam companies trying to copy our business plan, and we use
copyright and trademark infringement against spammers.

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>


-----Original Message-----
From: Matthew Cline [mailto:***@nightrealms.com]
Sent: Tuesday, August 20, 2002 11:42 PM
To: SpamAssassin Talk ML
Subject: Re: [SAtalk] SA In The News
Post by Dan Kohn
Matt, I'm the chairman and co-founder of Habeas, and I am, in general,
opposed to software patents. However, I can reassure you that Habeas
does not have any software patents, because we don't have any
software! Habeas is the first startup that is not doing hardware or
software development, but is instead practicing "legal engineering".
OK, I guess the patent you're applying for isn't a software patent.
However,
it does seem similar to a software patent, as it's patetning a
concept/idea/method, and many of the agrumnets against software patents
could
(I think) also be used against this patent.
--
Give a man a match, and he'll be warm for a minute, but set him on fire,
and he'll be warm for the rest of his life.

ICQ: 132152059 | Advanced SPAM filtering software:
http://spamassassin.org


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old cell
phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list Spamassassin-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Tony L. Svanstrom
2002-08-23 12:57:02 UTC
Permalink
I'm not sure this is a reply to what I said, but I'll reply anyways (you
really really really should quote what you're replying to...).
Post by Dan Kohn
I believe all patents refer to methods. I don't think I'll be able to
convince you to like patents, although they were guaranteed in Clause I(8)(8)
of the Constitution (along with copyright), so they are as American as apple
pie.
And nothing "American" could ever be a bad thing, right?! *G*
Post by Dan Kohn
The good news is that unless you set out to copy Habeas' business plan, you
are extraordinarily unlikely to infringe our patent (once it's issued). As
Craig pointed out, we use patent infringement only against anti-spam
companies trying to copy our business plan, and we use copyright and
trademark infringement against spammers.
It's going to be interesting to read said patent (I don't believe I'll ever
say that again - anyway, anyone got a link?); I'm very interested in if it
might (evil lawyer-might...) cover such things as large organizations saying
that people can filter on certain of their added headers.


/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Lars Hansson
2002-08-23 13:45:51 UTC
Permalink
On Fri, 23 Aug 2002 00:13:36 -0700
Post by Dan Kohn
I believe all patents refer to methods. I don't think I'll be able to
convince you to like patents, although they were guaranteed in Clause
I(8)(8) of the Constitution (along with copyright), so they are as
American as apple pie.
Well, apple pie isn't american at all, it's british ;)
Besides, the american constitution doesn't apply to other countries anyway
so it's a moot point.

---
Lars Hansson


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Darren Coleman
2002-08-23 13:59:04 UTC
Permalink
Today I had my boss haranging me over how a pornographic email managed to
reach him through SpamAssassin. Further inspection of the email revealed
that it was basically just an image (as there porn emails often are).
There were a few giveaways in the HTML (images in folders called "xxx",
etc) which SA caught, but it fell short of the threshold.

Does anyone know of any open-source filter I can add to Qmail (or similar)
that will scan images (skin tone detection, etc) similar to what
MessageLabs et al sell as a commercial product?

Apologies for this being slightly off-topic.

Regards,

Daz
Matt Sergeant
2002-08-23 14:09:15 UTC
Permalink
Post by Darren Coleman
Today I had my boss haranging me over how a pornographic email managed to
reach him through SpamAssassin. Further inspection of the email revealed
that it was basically just an image (as there porn emails often are).
There were a few giveaways in the HTML (images in folders called "xxx",
etc) which SA caught, but it fell short of the threshold.
Does anyone know of any open-source filter I can add to Qmail (or similar)
that will scan images (skin tone detection, etc) similar to what
MessageLabs et al sell as a commercial product?
I don't think there's anything open source that does this, I'm afraid.
Not that I've ever seen (and believe me, we'd be very interested if
there was ;-)

I guess blocking porn isn't an itch open source geeks feel the need to
scratch ;-)




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Dan Kohn
2002-08-23 07:13:42 UTC
Permalink
Quite true, which is why we appreciate support from the SA community.
Note that until Habeas reaches 100% penetration (don't hold your
breath), it really works best when used in conjunction with a spam
filter like SpamAssassin. SA identifies mail that is likely to be spam.
Habeas helps eliminate false positives, enabling SA to be set to a
tighter threshold.

Although Habeas is designed to work with essentially all spam filters,
it's no coincidence that it fits in so easily to the SA rules. I'm a
huge SpamAssassin fan, and put it right with Tivo and Webwasher as the
three essential technologies that enable me to regain control of my time
in regard to unwanted advertising.

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>


-----Original Message-----
From: Craig R.Hughes [mailto:***@deersoft.com]
Sent: Thursday, August 22, 2002 9:29 AM
To: Tony L. Svanstrom
Cc: Matthew Cline; SpamAssassin Talk ML
Subject: Re: [SAtalk] SA In The News


Good luck! They need us a whole heck of a lot more than we need them.

C
Post by Tony L. Svanstrom
maybe Spam-
Assassin and/or Deersoft must start paying a fee for
recognizing their lil
haiku.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list
Spamassassin-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Dan Kohn
2002-08-23 07:13:49 UTC
Permalink
Justin, millions of Azerbaijanis are crying out in protest against being
mislabeled a bizarre place. ;-)

In fact, the "bizarre" countries not covered by copyright are:
Afghanistan, Bhutan, Ethiopia, Iran, Iraq, Nepal, Oman, San Marino,
Tonga and Yemen.

http://www.habeas.com/faq/index.htm#5.4

Note that the HIL becomes essential in stopping spam from these places.
However, if you could rope spam off into those 9 corners of the world,
we'd be making a lot of progress.

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>


-----Original Message-----
From: Justin Mason [mailto:***@jmason.org]
Sent: Wednesday, August 21, 2002 5:36 AM
To: Tony L. Svanstrom
Cc: Matthew Cline; SpamAssassin Talk ML
Subject: Re: [SAtalk] SA In The News
Post by Tony L. Svanstrom
Yes, saying that they'll sue anyone using it is easy, but the sad
fact is th at even if they're Bill Gates they can't reach a lot of
the spammers out there; not to mention that it isn't even sure thing
that their text is/could be copyrighted in every part of the world.
well, that's one good thing that came out of GATT and WIPO, along with
pushing the US software patent agenda and all that crap. It *is*
copyrighted worldwide, afaik, apart from a few bizarre places like
Azerbaijan etc.

--j.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old cell
phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list Spamassassin-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Dan Kohn
2002-08-23 07:13:33 UTC
Permalink
The Habeas Infringers List (HIL) is essential for dealing with spammers
in certain slow-moving jurisdictions. Note, though, that MSFT has
successfully pursued copyright cases in China, so it's not impossible.
(Insert WIPO, WTO and other random treaty acronyms here.)

http://www.habeas.com/faq/index.htm#6

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>


-----Original Message-----
From: Craig R.Hughes [mailto:***@deersoft.com]
Sent: Tuesday, August 20, 2002 11:39 PM
To: Matthew Cline
Cc: SpamAssassin Talk ML
Subject: Re: [SAtalk] SA In The News


It's not really a software patent. As I understand it, they are
patenting the concept of putting a copyrighted string in an
RFC822 header in order to identify spam, and in order to create
a private right of action against spammers. It actually is a
really neat legal trick. In fact, patenting this is not
necessarily a bad thing anyway, since it really only makes sense
to have one such header to check for -- if there are hundreds,
it'll dillute the effectiveness of the method. I've been
talking with Dan Kohn about this concept for a while now (since
the time I was contemplating Deersoft), and I think it sounds
like quite a promising concept -- we'll see how well they do at
enforcing their copyright against chinese spammers and the like.

C
Post by Matthew Cline
The negative thing about their system is that it's "patent
pending", and I
don't exactly like the idea of helping a business that uses
software patents,
but if it helps to reduce false positives...
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list
Spamassassin-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Dan Kohn
2002-08-23 07:13:46 UTC
Permalink
Craig and I hashed out this philosophical argument a few months ago
(resulting from the fact that we both spent too much time being educated
in the liberal arts).

Philosophically, I believe that the natural state of humankind is
Hobbesian: nasty, brutish, and short. What stops people from rampaging
down the streets, raping and pillaging, is a set of laws that are
commonly agreed to be reasonable. It's unrealistic to assume that
solely technological solutions will solve the fundamentally social
problem of spam. Society operates by setting reasonable rules and then
holding people to them. Habeas is a transformative implementation of
copyright and patent law that builds a new social contract for
reclaiming email, the central communications medium of our time.

As to Craig's specific example, the key step is when we send the
cease-and-desist letter to the well-meaning housewife just trying to
save her farm. We explain that she is welcome to keep spamming if she
wants to (though we recommend against it), but that she is
misappropriating Habeas' intellectual property, and that if she doesn't
stop, she is subject to a multi-million dollar civil suit. Now, if she
stops, then it's hardly in Habeas' interest to pursue a lawsuit. But if
she doesn't, then at some point "But I'm a nice Midwestern housewife" is
no longer a sufficient defense against stealing.

In that sense, Habeas is very much based on the Hobbesian view that you
get the behavior you tolerate. Needless to say, I expect our first civil
suit will be against not a Midwestern housewife, but the most egregious
imaginable spammer, preferably someone who both pushes child pornography
and beats his wife.

Note that the Habeas Infringers List is essential to the Habeas system,
in that it stops infringing spammers until we can get an injunction,
especially in places with slow moving legal systems.

http://www.habeas.com/faq/index.htm#6

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>


-----Original Message-----
From: Craig R.Hughes [mailto:***@deersoft.com]
Sent: Thursday, August 22, 2002 9:36 AM
To: Tony L. Svanstrom
Cc: Matthew Cline; SpamAssassin Talk ML
Subject: Re: [SAtalk] SA In The News


For me the question of enforcement comes down to who the
"average" spammer is. I personally think a huge volume of spam
comes from some mom&pop operation where Mom bought into a
get-rich-quick scheme "Make money from your living room" type
thingie, hooked up a DSL line out to the farmhouse in rural
Iowa, and started sending out medium volumes of spam for a
living, not knowing how much it pisses people off. In her mind,
Mom is doing nothing particularly wrong. Heck her list provider
assured her it was a double-confirmed-super-opt-in list. All
those people want to get the spam. And the guy who sold her the
spamming software for $59.95 told it it was fully legit, and
would get through all those dumb spam filters out there which
weren't smart enough to realize she isn't actually a spammer,
since it's a double-super-opt-in list (not). So the software
automatically inserts Habeas SWE headers. Now Habeas gets wind
that Mom is spamming, and using their headers in an unlicensed
way. Habeas sues mom? Wins? Gets a lien on the farm? Evicts
mom and pop?

If a large enough number of spammers are of this type (and I
think they probably are), then Habeas could have problems. And
that's not even considering the Korean/Chinese spammers who are
probably not even reachable by Habeas' enforcement agents. Now
to deal with both of these more cheaply than through legal
means, Habeas does have the HIL RBL system... It could be
they'll do OK.

C
Post by Tony L. Svanstrom
Yes, saying that they'll sue anyone using it is easy, but the
sad fact is that
even if they're Bill Gates they can't reach a lot of the
spammers out there;
not to mention that it isn't even sure thing that their text
is/could be
copyrighted in every part of the world.
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list
Spamassassin-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Steve Evans
2002-08-23 16:29:15 UTC
Permalink
I looked into a lot of the commercial products and ran some tests. Any
product that claims to block n% of in-appropriate images is only
reaching that level because it blocks n% of everything. I didn't find a
single product that really did anything remotely useful.

Steve Evans
(619) 594-0653

-----Original Message-----
From: Matt Sergeant [mailto:***@startechgroup.co.uk]
Sent: Friday, August 23, 2002 7:09 AM
To: Darren Coleman
Cc: spamassassin-***@lists.sourceforge.net
Subject: Re: [SAtalk] OT: Spam related - porn filtering
Post by Darren Coleman
Today I had my boss haranging me over how a pornographic email managed
to reach him through SpamAssassin. Further inspection of the email
revealed
that it was basically just an image (as there porn emails often are).
There were a few giveaways in the HTML (images in folders called
"xxx",
Post by Darren Coleman
etc) which SA caught, but it fell short of the threshold.
Does anyone know of any open-source filter I can add to Qmail (or
similar)
that will scan images (skin tone detection, etc) similar to what
MessageLabs et al sell as a commercial product?
I don't think there's anything open source that does this, I'm afraid.
Not that I've ever seen (and believe me, we'd be very interested if
there was ;-)

I guess blocking porn isn't an itch open source geeks feel the need to
scratch ;-)




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old cell
phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list Spamassassin-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Dan Kohn
2002-08-23 17:37:46 UTC
Permalink
I haven't seen anything on Bonded Sender that would make me think they
violate the pending Habeas patent. For more on Bonded Sender, see:

http://www.habeas.com/faq/index.htm#7.5

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>


-----Original Message-----
From: Bart Schaefer [mailto:***@zanshin.com]
Sent: Thursday, August 22, 2002 8:13 AM
To: SpamAssassin Talk ML
Subject: Re: [SAtalk] SA In The News
Post by Matthew Cline
Post by Harold Hallikainen
http://www.wired.com/news/technology/0,1282,54645,00.html
Summary: a company will offer short snippets of original, copyrighted
and trademarked text that can be inserted into email message headers,
and email filters can recognize this as a "not-spam" indicator. Any
spammers who use the text will be sued for copyright and trademark
infringement.
They may be in for a patent fight before any of this goes forward:

http://www.eweek.com/article2/0,3959,476558,00.asp

"Banking on the fact that few enterprises sending commercial mail want
to
be associated with spam, IronPort Systems Inc., in San Bruno, Calif.,
has
developed the Bonded Sender program in an effort to give legitimate bulk

e-mailers some credibility."



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Tony L. Svanstrom
2002-08-23 18:27:26 UTC
Permalink
Post by Dan Kohn
I haven't seen anything on Bonded Sender that would make me think they
Part of that FAQ says tthis:

# Habeas has a patent pending on the using RFC 2822 headers with trademark
# and copyright infringement prosecution to enforce a sender's warranty that
# their mail is "not spam".

So, if I claim that e-mails with a certain information in the headers isn't
spam then, if you do get that patent and I'd legally would enforce keeping that
information out of spam, I'm in trouble no matter what header it is?



/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-23 21:50:40 UTC
Permalink
Only if you've copyrighted and/or trademarked the header
contents, and are using trademark/copyright law to take action
against spammers who are using those headers innapropriately.

C
Post by Tony L. Svanstrom
# Habeas has a patent pending on the using RFC 2822
headers with trademark
# and copyright infringement prosecution to enforce a sender's
warranty that
# their mail is "not spam".
So, if I claim that e-mails with a certain information in the
headers isn't
spam then, if you do get that patent and I'd legally would
enforce keeping that
information out of spam, I'm in trouble no matter what header it is?
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Tony L. Svanstrom
2002-08-23 22:47:56 UTC
Permalink
Post by Craig R.Hughes
Post by Tony L. Svanstrom
So, if I claim that e-mails with a certain information in the headers
isn't spam then, if you do get that patent and I'd legally would enforce
keeping that information out of spam, I'm in trouble no matter what header
it is?
Only if you've copyrighted and/or trademarked the header
contents, and are using trademark/copyright law to take action
against spammers who are using those headers innapropriately.
And now for the important Q: Does the patent exclude such things as if I were
to sue a spammer for having my school/company/organizations organization-header
in his spam?


/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Craig R.Hughes
2002-08-23 21:37:47 UTC
Permalink
FYI I just checked Ironport bonded sender stuff into CVS on both
branches. Also got a nice patch contributed from Ironport which
allows per-RBL customization of how many received lines to scan,
and whether to use the most-recent received lines, or the oldest
(forgedest?) lines.

C
Post by Dan Kohn
I haven't seen anything on Bonded Sender that would make me think they
http://www.habeas.com/faq/index.htm#7.5
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Dan Kohn
2002-08-23 22:42:09 UTC
Permalink
Right. But given that it's "priced to move" for individuals and ISPs
(and only $200 for enterprises), why not just use Habeas rather than
recreating our system?

- dan
--
Dan Kohn <mailto:***@dankohn.com>
<http://www.dankohn.com/> <tel:+1-650-327-2600>


-----Original Message-----
From: Tony L. Svanstrom [mailto:***@svanstrom.com]
Sent: Friday, August 23, 2002 11:27 AM
To: Dan Kohn
Cc: SpamAssassin Talk ML
Subject: RE: [SAtalk] SA In The News
Post by Dan Kohn
I haven't seen anything on Bonded Sender that would make me think they
Part of that FAQ says tthis:

# Habeas has a patent pending on the using RFC 2822 headers with
trademark
# and copyright infringement prosecution to enforce a sender's warranty
that # their mail is "not spam".

So, if I claim that e-mails with a certain information in the headers
isn't spam then, if you do get that patent and I'd legally would enforce
keeping that information out of spam, I'm in trouble no matter what
header it is?



/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! # #
Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Bart Schaefer
2002-08-23 22:52:39 UTC
Permalink
Post by Dan Kohn
-----Original Message-----
So, if I claim that e-mails with a certain information in the headers
isn't spam then, if you do get that patent and I'd legally would enforce
keeping that information out of spam, I'm in trouble no matter what
header it is?
Right. But given that it's "priced to move" for individuals and ISPs
(and only $200 for enterprises), why not just use Habeas rather than
recreating our system?
I have to say, I just can't believe this will fly. It can't possibly be
legal to use a patent to prevent someone else from defending their own
copyrights or trademarks, and there's plenty of prior art for putting a
copyrighted or trademarked string in an email header (X-Mailer: if nothing
else).



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Tony L. Svanstrom
2002-08-23 23:00:54 UTC
Permalink
Post by Dan Kohn
Right. But given that it's "priced to move" for individuals and ISPs
(and only $200 for enterprises), why not just use Habeas rather than
recreating our system?
Maybe I just want to keep to myself and sue spammers if they use my e-mail
address as the from, since my e-mailaddress is a sign of it not being spam (nor
bulk) or sue them for using my organization-header...?!?!?

It's sounds (I still want to read that patent, is it available online?) that
by doing that (suing the spammers for something like using my e-mailaddress to
take advantage of peoples whitelists) I basically has to say that I'm using
something that Habeas has patented...



/Tony
--
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 ***@svanstrom.com =*= #

perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Robert L Mathews
2002-08-23 23:15:53 UTC
Permalink
Post by John Rudd
It is entirely tangential to open source. Your beef is not with patents
vs open source, but patents vs free software.
For example, if I patent some code in my product, I could feel more
comfortable distributing the source to my customers because I have some
other mechanism to protect my IP than "obscurity and difficulty of
reverse compilation/engineering". I can instead make the source
available (and thus allow the customer to make customizations and such),
but restrict use of the code to those who have purchased a license
(which may be more or less effective depending on the market in
question). It makes me MORE likely to move toward open source if legal
(not sofware token) license restrictions alone are effective (Ex: a
software patent in the transmission computer of a car, where I as the
patent holder could write up the code and send it to toyota including my
code as an example of the algorythm, giving them the source so they can
customize it, and requiring that they pay me a license fee per car sold
that uses my patent and sue them if I find my alg in their prom, and
they haven't paid me).
The people for whom software patents are bad are the free software
community, who oppose any form of code-slavery. Don't confuse "Open
Source Software" with "Free Software". They're related, but distinct,
concepts.
Well, I don't think you're really disagreeing with the point; this is
just a quibble over the definition of "open source" and "free software".
Your example of "open source" would not meet the OSI definition at:

http://www.opensource.org/docs/definition.php

SpamAssassin is "open source" in the OSI sense, since it's under the
Artistic License. That's the sense in which I was using the term "open
source" -- I didn't mean "proprietary software that has the source
available".

Patented concepts can't be used in software distributed under any OSI
"open source" license, unless the patent holder grants a universal
perpetual free license to use the concept. SpamAssassin isn't actually
using the Habeas concept (inserting the mark), so it's not a direct
problem, but such a patent would obviously prevent authors of mail
programs from including their own code to do something similar to what
Habeas does, using their own mark standards. If you're the author of an
open source mail program (or a proprietary one, for that matter), such a
patent certainly restricts what you can do.

And again, maybe that's still good for the overall community in some
cases, like when the concept can't really work unless some organization
has deep pockets to implement it, which might be the case here.
Restricting the rights of mail program authors may be worth it in
exchange for lowering spam in the real world -- lowering the amount of
spam may be more valuable than the loss of freedom the patent creates,
when you consider the tradeoff.

But that isn't what many people have been saying. They've been saying
either they don't care in the least about the restriction side of it, or
they don't believe that patents restrict "open source" (OSI sense) at
all. That's not the same thing as "the tradeoff is worth it."

------------------------------------
Robert L Mathews, Tiger Technologies



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Robert L Mathews
2002-08-25 04:00:50 UTC
Permalink
Post by Dan Kohn
Right. But given that it's "priced to move" for individuals and ISPs
(and only $200 for enterprises), why not just use Habeas rather than
recreating our system?
Two reasons:

First, people may want to use the idea in ways that you don't provide.
For example, a similar idea is a mark that can only be used with mail
that meets some "kid friendly" standard, so that e-mail for teenagers
could be whitelisted. Or a mark that companies can only use to send mail
to people who have given them money, so that people could whitelist
otherwise "spammy" mail from companies they've really done business with
(*cough*Amazon*cough*). I could easily think of a dozen others, not all
of which Habeas is likely to implement.

Secondly, if something were to change in the future and your company
becomes Evil(tm), the community may decide that you aren't implementing
the idea in a beneficial manner. For example, people might decide that
your definition of "spam" two years from now isn't restrictive enough. If
that happened, nobody else could create a more restrictive version.

I was thinking of how to illustrate potential problems with patenting
anti-spam ideas, and I think perhaps RBLs are a good example. The initial
RBL was a novel idea that someone could conceivably have tried to patent.
Much like the Habeas system, most people probably wouldn't have objected
to such a patent as long as it worked to reduce spam and they could use
it for free. But things changed (some early RBL operators were dangerous
lunatics, and some of them were shut down as a result of disputes over
how they ran things), and other people wanted to set up their own RBLs.
Gradually, the community decided to use the idea in different ways, and
it evolved in a manner that was harmful to some of the specific RBL
operators, but good for the greater community.

If RBLs had been patented, we'd be relying on one company's opinion about
who should be listed, how it should be run, etc. I don't think many
people would be happy with that.

The same is true of the Habeas system: if the goals of the "anti-spam
community" (whatever that is) ever come into conflict with the goals of
Habeas, a patent means that things go the way Habeas wants, and (unlike
RBLs) not the way the community wants.

I can't blame Habeas for wanting complete control and a monopoly on the
idea, but the consequences of that bear thinking about.

------------------------------------
Robert L Mathews, Tiger Technologies



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Michael Moncur
2002-08-25 10:40:18 UTC
Permalink
I know other people have mentioned similar problems before, but they never
happened to me until about a week ago, running the pre-2.40 CVS version.

About once a day I get a spam message that slipped through the cracks - no
spamassassin headers, etc. Is this what happens when spamd is unreachable?
Has anyone else had a higher occurrence of this recently than before?

--
Michael Moncur mgm at starlingtech.com http://www.starlingtech.com/
"A fashion is nothing but an induced epidemic." --George Bernard Shaw



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Justin Mason
2002-08-28 10:23:39 UTC
Permalink
Lynx... hide it with javascript...
or, another trick: put this in the <head> block:

<font size=-6><a href=mailto:***@spamtraps.taint.org></a></font>

no browser should render a zero-width href, esp. in the <head> block. ;)

--j.


-------------------------------------------------------
This sf.net email is sponsored by: Jabber - The world's fastest growing
real-time communications platform! Don't just IM. Build it in!
http://www.jabber.com/osdn/xim
Chris Petersen
2002-08-28 17:18:57 UTC
Permalink
Post by Justin Mason
no browser should render a zero-width href, esp. in the <head> block. ;)
wouldn't it be easier to just comment it out? Presumably those spiders
grab as much as they can, and wouldn't care about addresses that have been
commented out. and that would NEVER be rendered in a browser.

-Chris



-------------------------------------------------------
This sf.net email is sponsored by: Jabber - The world's fastest growing
real-time communications platform! Don't just IM. Build it in!
http://www.jabber.com/osdn/xim
Malte S. Stretz
2002-08-29 09:48:44 UTC
Permalink
Post by Chris Petersen
Post by Justin Mason
<font size=-6><a
should render a zero-width href, esp. in the <head> block. ;)
wouldn't it be easier to just comment it out? Presumably those spiders
grab as much as they can, and wouldn't care about addresses that have
been commented out. and that would NEVER be rendered in a browser.
Maybe that would be worth a test: Set up three addresses, one hidden by
white-on-white, one by an empty link and one commented out. Then see how
"intelligent" the spambots are (eg. which addresses are spammed).

Malte
--
-- Coding is art.
--
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Justin Mason
2002-09-03 17:02:26 UTC
Permalink
- document that any existing AWL will become obsolete and be
regenerated from scratch, or
Any existing AWL becomes obsolete anyway because of the new IP-AWL code,
doesn't it? If so, I vote for this one...
There's migration code already for this -- it'll upgrade an existing
entry based on the first IP it sees for that username.
So I guess I'll revert the code then.

Does doing a 2.41 update tomorrow morning (GMT) make sense for everyone?

--j.


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Matt Sergeant
2002-09-03 17:10:56 UTC
Permalink
Post by Justin Mason
- document that any existing AWL will become obsolete and be
regenerated from scratch, or
Any existing AWL becomes obsolete anyway because of the new IP-AWL code,
doesn't it? If so, I vote for this one...
There's migration code already for this -- it'll upgrade an existing
entry based on the first IP it sees for that username.
So I guess I'll revert the code then.
Does doing a 2.41 update tomorrow morning (GMT) make sense for everyone?
Please test it works on a default 5.00503 system before releasing. You
can use sourceforge's compile farm for that.

Alternatively if you give it a little more time I can test it.

Matt.




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
CertaintyTech
2002-09-03 17:43:56 UTC
Permalink
Post by Matt Sergeant
Please test it works on a default 5.00503 system before releasing. You
can use sourceforge's compile farm for that.
Alternatively if you give it a little more time I can test it.
Matt.
To summarize what I have seen with my 5.00503 Perl system:

1. Edited line 39 of Mail/SpamAssassin/SHA1.pm. Changed:

no warnings "uninitialized";

to:

#no warnings "uninitialized";

2. Edited line 74 of Mail/SpamAssassin/SHA1.pm. Changed:

@W = unpack N16, $_."\0"x7;

to:

@W = unpack "N16", $_."\0"x7;


Once I made these changes then spamd would start and everything now looks
good.

Ed.




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Michael Stauber
2002-09-03 20:30:42 UTC
Permalink
Hi Ed,
Post by CertaintyTech
#no warnings "uninitialized";
@W = unpack "N16", $_."\0"x7;
Once I made these changes then spamd would start and everything now looks
good.
I can confirm that applying these changes solved the problem on the Sun Cobalt
RaQ3, RaQ4 and Qube 3, which all run Perl-5.005.

However, SA then complained that the installed HTML::Parser (v2.27) was too
old and HTML::Parser >V3.XX was needed.

So I ended up with having to upgrade / install the following two modules:

perl-HTML-Tagset-3.03
perl-HTML-Parser-3.26

Now SA-2.40 works fine.

I suggest adding the requirement for perl-HTML-Parser-3.XX in the
documentation or adding some tests in ./configure which check the presence
and version number of HTML::Parser.
--
With best regards,

Michael Stauber
***@solarspeed.net
Unix/Linux Support Engineer



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Malte S. Stretz
2002-09-03 20:51:33 UTC
Permalink
Post by Michael Stauber
[...]
However, SA then complained that the installed HTML::Parser (v2.27) was
too old and HTML::Parser >V3.XX was needed.
Done. `make` now complains if the HTML::Parser version is incorrect.
Post by Michael Stauber
[...]
M
--
-- Coding is art.
--
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Loading...