-
一种包依赖引发的通宵发布与回滚
发表于 2010年04月16日 没有评论引用来自架构组姚明同学的邮件:
Intl-biz/wsproduct里面的代码修改如下:
public void setIndByAll(Integer indByAll) → public void setIndByAll(Integer indByAll, Object… objects)
对应于intl-biz/wholesalesearch里面的代码:
WsProductDO.setIndByAll
由于intl-biz/wholesalesearch是依赖于intl-biz/wsproduct,但是无论Intl-biz/wsproduct里面的代码是否修改,都是能编译通过的。
那么问题出在哪里呢?问题出在方法签名不一样。
这里,我们的编译脚本有一个致命问题:
我们的编译脚本是先编译了intl-biz/wholesearch,这时候对应的方法签名是:WsProductDO.setIndByAll(Ljava/lang/Integer;)
接着才编译了intl-biz/wsproduct,这时候这个方法签名发生了变化,变成了WsProductDO.setIndByAll(Ljava/lang/Integer;[Ljava/lang/Object;)
所以到线上的时候,intl-biz/wholesearch需要的是前者,但是intl-biz/wsproduct提供的却是后者,所以出现了大家看到的异常。
我的看法:
- 减少不必要的包依赖;可以走远程服务的走远程服务;
- 底层有变动接口签名,一定要通知相关方,当我们改的人不知道相关方有哪些团队时,一定要在发布群里说一下,并找架构组同学REVIEW;架构组同学的信息会比较全面一些;
- 发布时的编译顺序很关键,这个编译顺序列表要随时维护;
- 减少不必要的包依赖;可以走远程服务的走远程服务;
-
程峰同学整理的RSYNC算法资料
发表于 2010年02月4日 没有评论【rsync算法】
rsync是一个开源工具,提供快速地文件增量传输,其核心是”rsync算法”。该算法解释如下:
现有两台机器a与b。a上有文件A,b上有文件B,且文件A与B非常相似(可能来自同一份源文件,只各自做了少量修改)。则rsync算法由如下若干步骤组成:
- b以s个字节为单位,将文件B分割成不重叠的若干块(例如B是1024字节,s是512字节,则B被分割成不重叠的2块)。最后一块可能不到s个字节。
- b对每个s字节块都计算两种校验和:32位的弱”滚动”校验和、128位的强MD4校验和。
- B将这些校验和发给a。
- a逐一搜索A中每个s字节块(因为A中出现不同内容的长度可能是非s整数倍的字节,所以A中每个s字节块的步进偏移量是1),找到与B中某s字节块有同样弱校验和与强校验和的字节块。(a如何找到文件A中与文件B中相同的字节块,这里就不详述了。该步骤是个计算密集的过程。为了提高性能,弱校验和的算法应该仔细设计,以到达单向、快速、计算量小。)
- a向b发送一串指令。b根据这些指令同步B文件。这些指令要么是到B某个字节块的引用,要么是数据内容。这些数据内容均是A中与B不同的字节块。
b通过以上步骤,最终同步B文件。
【弱滚动校验和】
“rsync算法”的第4步中,a要对以1为步进值的所有s字节块频繁地计算弱”滚动”校验和。这就要求该校验和的算法应该尽可能的计算量小。下面是具体的算法:

s(k,l)是字节块Xk…Xl的滚动校验和,该算法的优势如下:

可见当已知字节块X1…Xn的滚动校验和以及X1、Xn+1的值时,使用该算法可以很方便地计算出X2…Xn+1的滚动校验和。
以上只是一个简要的介绍,具体内容因篇幅有限,就不写出来了。这个算弱”滚动”校验和的算法挺巧妙的,所以就多写了些。其实这个算法是脱胎于Adler-32校验和算法的,有兴趣的童鞋可以再去看看。
【校验和查找】
“rsync算法” 的第4步中,关于机器a如何查找相同校验和的字节块的过程如下。
当机器a接收到机器b计算出的文件B的校验和列表后,会搜索文件A中任一偏移量的s字节块,以找出相同内容的字节块。该搜索过程的基本策略是a顺序计算每个s字节块(偏移量的步进值为1)的32位滚动校验和,然后用该校验和在b发送的校验和列表中寻找匹配值。Rsync的算法通过一个简单的3层查询机制来实现该过程。
第一层查询中,机器a首先为b传过来的校验和列表中每个32位弱滚动校验和,计算一个对应的16位哈希值,并据此对b传过来的校验和列表进行排序。同时机器a创建一个容量为2的16次方的哈希表。在该哈希表中,每条记录均指向排序后的校验和列表中第一个与该16位哈希值相等的元素。如果排序后的校验和列表中没有与该16位哈希值相等的元素,则该条记录指向null。
机器a对文件A中每个偏移量的s字节块,都计算出32位滚动校验和与16位哈希值。如果容量为2的16次方的哈希表中,对应该16位哈希值的记录指向不是null,则进入第二层查询。
在第二层查询中,机器a会从哈希表中对应记录指向的位置,开始扫描排序后的校验和列表。该过程一直扫描到某个32位滚动校验和的16位哈希值,与哈希表中对应哈希值不同时结束。如果发现有相同滚动校验和,则进入第三层查询。
在第三层查询中,要比较该s字节块的MD4强校验和与对应校验和列表元素的MD4强校验和是否相等。如果相等,则我们假设找到了相同的字节块。其实存在这种可能性:弱滚动校验和与强MD4校验和均相同的两个字节块,其内容有可能不同。不过这种可能性非常小,所以rsync算法忽略不计这种情况。
当发现相同字节块后,机器a会发送当前偏移量到上次相同字节块偏移量+s之间的这段数据给机器b,然后是相同字节块在文件B中的索引。这些数据会在匹配成功后立即发送,这样机器a的校验和查询过程与机器b的文件同步过程可以并行进行。
在第二层查询中,如果最终没有发现相同滚动校验和,则机器a会步进到下一个s字节块进行滚动校验和计算(这一次的步进值是1),并重新执行上面的第一层查询。如果机器a最终找到相同校验和的字节块,则从该字节块后面重新开始新的搜索过程(这一次的步进值是s)。这个小技巧在A与B两个文件十分相似时,可以减少大量不必要的计算(查找字符串的子串时,这个技巧也经常使用)。
下面是3张图示,大家将就着看吧 -_-|||…



说明一下,我没有看rsynce算法的源码。所以这三张图只是个示意,可能存在出入。
【总结一下】
Rsync算法是Andrew Tridgell与Paul Mackerras在1998年的论文中介绍的。该算法主要针对当时网络情况的不稳定、窄带宽、高延时做出的。rsync算法很适合对一些小尺寸、相似度高的文件进行同步,但不适合大尺寸、路径结构深、文件数量庞大的文件群进行同步。原因从rsync算法可以看出,它需要对所有文件的全部弱”滚动”校验和与强MD4校验和进行计算、传递、比较,不管文件是否修改过。
那如果我们能够标示出已经改变的文件,并只对这些文件做rsync同步,效率应该会大大提高。关于如何标示出修改过的文件,网上其实已经有许多解决办法,我就不当祥林嫂了。引用比较多的是:把某个需要同步的文件夹做成一个hash tree,使用修改时间作为hash值。因为linux下文件的修改,只会反映到该文件所在目录的修改时间,而不会反映到更上层的目录中,所以需要使用inotify接口自己写一个更新以上目录修改时间的同步程序(这个inotify我从没有用过,只是人云亦云罢了)。
- b以s个字节为单位,将文件B分割成不重叠的若干块(例如B是1024字节,s是512字节,则B被分割成不重叠的2块)。最后一块可能不到s个字节。
-
实用的分布式KV数据库TTSERVER
发表于 2009年07月15日 没有评论关系型数据库在海量数据存储的互联网应用中,经常面临挑战:
- 没有KEY-VALUE的快速HASH查找能力;
- 访问接口不支持HTTP;
- 读写的TPS不够高;
从GOOGLE的BIGTABLE开始,KEY-VALUE类型的数据库引起了越来越多的开发人员注意。互联网项目中也越来越多的出现了KV结构的数据库。
随着分布式技术需求的增加,我们在选择数据库的时候希望数据库本身能支持比较好的分布式。当增加机器或减少机器的时候也能快速的进行。
TOKYO TC是一个很好的DBM,在TODYO TT的支持下,TC可以实现很好的分布式存储和数据的查找,加上TD,还可以实现分布式搜索。
TT的性能很不错,我在一台普通的1U服务器上测试了一下。
[root@www www]# /usr/local/bin/tcrmttest write -port 11211 127.0.0.1 1000000
<Writing Test>
host=127.0.0.1 port=11211 tnum=1 rnum=1000000 nr=0 ext= rnd=0
……………………. (00100000)
……………………. (00200000)
……………………. (00300000)
……………………. (00400000)
……………………. (00500000)
……………………. (00600000)
……………………. (00700000)
……………………. (00800000)
……………………. (00900000)
……………………. (01000000)
record number: 1000001
size: 32430400
time: 46.946
ok
主机LOAD最高达到1.6。测试机器是1G内存2GCPU。
TT非常适合用在Memcache用的地方。
TT也有一些不足,比如数据量增长了,一台TT SERVER的性能不能支持。我现在要加一台机器,原来TT服务器上的数据就需要手工导出到另一台机器上。如果数据的分离通用MOD的方式,那么应用层就需要修改。这时候就需要作一个中间层。
如果TT server有一台DONW机,那么所有TT SERVER的CACHE将重新分配,这时会引起性能的下降。这种情况下一致性HASH会好一些。
-
mysql5 BSD6 ports安装加入中文字符集支持
发表于 2007年04月16日 没有评论使用port安装默认是没有中文字符集gbk的,以下方法可以加入中文支持:
[code]
#cd /usr/ports/databases/mysql51-server/
#make deinstall clean
#rm /var/db
#cd /usr/ports/databases/mysql51-client/
#make deinstall clean
#reboot#cd /usr/ports/databases/mysql51-server/
#make WITH_CHARSET=gbk WITH_XCHARSET=all WITH_COLLATION=gbk_chinese_ci WITH_PROC_SCOPE_PTH=yes BUILD_OPTIMIZED=yes BUILD_STATIC=yes SKIP_DNS_CHECK=yes WITHOUT_INNODB=yes PTHREAD_LIBS=-lthr install clean#cd /usr/ports/databases/mysql51-client/
#make WITH_CHARSET=gbk WITH_XCHARSET=all WITH_COLLATION=gbk_chinese_ci WITH_PROC_SCOPE_PTH=yes BUILD_OPTIMIZED=yes BUILD_STATIC=yes SKIP_DNS_CHECK=yes WITHOUT_INNODB=yes PTHREAD_LIBS=-lthr install clean
[/code] -
sed
发表于 2006年08月1日 2 条评论2005-02-17 sed
What is sed?
Sed stands for Stream EDitor. It is called this because input flows through the program to standard out (this output is generally redirected to a file). It is also called a non-interactive editor becaue the user never alters files interactively (like you might on a screen using vi, pico, or emacs). Instead, the user sends a script of editing instructions to sed, plus the name of the file to edit. In this sense, sed works like a filter, able to delete, insert and change characters, words, and lines of text. Sed is very powerful and able to make many changes to many files in only a matter of minutes.Some basics:
In all probability, the command you need most is the “s” command. It Substitutes one thing for another. The simplest way to do this is like the above examples:sed ’s/color/colour/g’ filename
The “g” at the end stands for “global”. What it really means, though, is to replace every occurence on the line. If you leave it off, only the first occurence on each line will be changed.
You will encounter problems if you attempt to use any of the following characters in the string to replace:
.*[]^$
These characters have special meaning to sed. If you mean to replace literal occurences of those characters, preface them with a backslash. So, don’t do
sed ’s/[J.S. Bach {$ for music}]/[Bach, J.S {$ for music}]/’ filename
Instead, do
sed ’s/[J.S. Bach {$ for music}]/[Bach, J.S {$ for music}]/’ filename
Note that this does not apply to the replacement string.
What if you want to perform more than one such replacement at a time? You might try something like this:
sed ’s/color/colour/g’ ’s/flavor/flavour/g’ filename
but it wouldn’t work. sed would look for a file named “g” in the directory “s/flavor/flavour”. The “-e” flag to sed makes it realize that the next option is a part of the script, instead of a filename. You also must use it for the first part of the script, when you have more than one part. So, you would use
sed -e ’s/color/colour/g’ -e ’s/flavor/flavour/g’ filename
If you only had one replacement to do, you could still use the “-e” flag, but you don’t need to.
The various commands are applied in the order given to sed, so if you ran
sed -e ’s/color/colour/g’ -e ’s/colour/color/g’ filename
it would turn “color” to “colour” and then back to “color”. So, all occurences of “color” or “colour” would end up as “color”. This is an inefficient way to do that, though.
What if you want to replace something that contains a ‘/’ character? This is a common problem with filenames. You could escape each one, like so:
sed ’s//usr/bin//bin/g’ filename
This is not fun for long pathnames. There is a nice alternative: sed will treat the character immediately after the ’s’ as the separator, so you could do something like
sed ’s#/usr/bin#/bin#g’ filename
Using regular expressions
sed can use regular expressions just like ed(1) can. Here are some common uses of regular expressions.* The ‘^’ character means the beginning of the line.
sed ’s/^Thu /Thursday/’ filename
will turn “Thu ” into “Thursday”, but only at the beginning of the line. Note that the “g” flag is not used, since you can’t have multiple beginnings of a line. Also note that you don’t need to put the ‘^’ in the replacement string.
* The ‘$’ character means the end of the line.
sed ’s/ $//’ filename
will replace any space character that occurs at the end of a line. Again, the “g” flag is not used, and the ‘$’ is not used in the replacement string.
You can “replace” the end of the line, like this:
sed ’s/$/EOL/’ filename
This does not form one long line, but it puts the string “EOL” at the end of each line.
You can match a blank line by specifying an end-of-line immediately after a beginning-of-line:
sed ’s/^$/this used to be a blank line/’ filename
* The ‘.’ character means “any character”. This does not mean the beginning or end of a line, though. If you were using a log file which had the date in the form “Wed Dec 31 16:00:00 1969″ and wanted to erase the dates and times from a certain month and year, you could use
sed ’s/Apr .. ..:..:.. 1980/Apr 1980/g’ filename
* The square brackets “[]” are used to specify any one of a number of characters. This is useful when you don’t know if a letter will be upper or lower case:
sed ’s/[Oo]pen[Ww]in/openwin/g’ filename
* You can specify a range of characters using a ‘-’ inside the square brackets. This will include any character between (in ASCII terms) the two listed. If you wanted to delete middle initials, you could use
sed ’s/ [A-Z]. / /g’ filename
Notice that the literal period had to be escaped, as mentioned above. Also, we had to go from two spaces (one on each side of the middle initial) to one.
* If you want to exclude a set or range of characters, use the ‘^’ character as the first thing inside the brackets:
sed ’s/ [^A-DHM-Z]. / /g’ filename
This will delete any middle initials that are not A,B,C,D,H,M,N,…,Z.
* The ‘*’ character means “any number of the previous character”. This applies both to literal characters and to characters that are a result of using “[]” or ‘.’. For example,
sed ’s/ *$//’ filename
deletes all trailing spaces from each line, while
sed ’s/[ ]*$//’ filename
deletes any sequence of trailing tabs and spaces. It also works when using “[^]“:
sed ’s/[ ][^ ]*$//’ filename
deletes the last word (sequence of non-spaces) on each line.
It is important to know that ‘*’ will match zero occurences. If you need to match an integer, for example,
sed ’s/ [0-9]* / integer /g’ filename
will turn ” ” into ” integer “, which is not what you want. In this case, you should use
sed ’s/ [0-9][0-9]* / integer /g’ filename
which will demand at least one digit.
* The combination “.*” means any number of any character. So,
sed ’s/col.*lapse/collapse/g’ filename
will act on any line which contains the letters “col” and then “lapse”, no matter what is in between. The ‘*’ character is greedy: it takes as many characters as it can. So, the above script would turn
a b col d e f lapse h i j k lapse m n
into
a b collapse m n
instead of
a b collapse h i j k lapse m n
Substitution and Saving
Up to this point, we have concentrated on deleting things that we match with “[]” and ‘.’. That’s because we had no way of saving what we matched. The “(” and “)” operators will save whatever is found between them. Notice that these parentheses must be preceded by a backslash, while the characters ^$[].* don’t need a backslash to act in a non-literal fashion. The first pair of “()” saves into a place called “1″, and the second pair into “2″, and so on.sed ’s/^([A-Z][A-Za-z]*), ([A-Z][A-Za-z]*)/2 1/’ filename
will turn “Lastname, Firstname” into “Firstname Lastname”. Notice how the comma is placed outside the first pair of “()” so it doesn’t get inclued in the last name. Otherwise, the result would be “Firstname Lastname,”.
Sometimes you will want to apply a substitution only to lines that meet some criteria that you can’t specify in the string to be replaced. You do this using something called an “address”. It comes before the “s” command. You can limit the command to a range of lines:
sed ‘1,20s/foobar/fubar/g’ filename
The line count is cumulative across files, and starts at 1.
You might want to apply a change only to lines that contain a string:
sed ‘/^Aug/s/Mon /Monday /g’ filename
Or to lines that don’t contain a string:
using sh or ksh or bash,
sed ‘/^Aug/!s/Mon /Monday /g’ filename
using csh or tcsh,
sed ‘/^Aug/!s/Mon /Monday /g’ filename
You can also apply the command to all lines between (and including) a start string and a stop string:
sed ‘/^Aug/,/^Oct/s/Mon /Monday /g’ filename
Normally sed reads a line, processes it, and prints it out. If you only want to see the lines that your command acted upon, then you don’t want it to print out everyting. The “-n” flag will stop sed from printing after processing. So,
sed -n ’s/fubar/foobar/g’ filename
will print nothing at all. You must use the ‘p’ flag to the ’s’ command to make it print out what it has processed:
sed -n ’s/fubar/foobar/gp’ filename
Sed from a file
If your sed script is getting long, you can put it into a file, like so:# This file is named “sample.sed”
# comments can only appear in a block at the beginning
s/color/colour/g
s/flavor/flavour/g
s/theater/theatre/gThen call sed with the “-f” flag:
sed -f sample.sed filename
Or, you can make an executable sed script:
#!/usr/bin/sed -f
# This file is named “sample2.sed”
s/color/colour/g
s/flavor/flavour/g
s/theater/theatre/gthen give it execute permissions:
chmod u+x sample2.sed
and then call it like so:
./sample2.sed filename



最近评论