Car Emissions
There are two concepts in car pollution which people generally get mixed up. Some exhaust gases are simply stinky and noxious, most notably particulate carbon and carbon monoxide. Those do direct damage to the humans near them and crops which grow nearby and are clearly bad. Pollutants are clearly bad and there isn't much direct economic disincentive for any one person to make their car produce less of them.
The other troublesome kind of exhaust is greenhouse gases, mostly carbon dioxide. The amount of damage caused by these is much less clear, and there's a straightforward economic disincentive to produce them, because they correspond pretty much directly to the amount of gas your car consumes. Carbon dioxide also happens to be produced in mass quantities by respiration.
If you really want to know how clean a car is, look it up on the EPA web site. There are some surprises, for example the honda civic hybrid with a manual transmission has mediocre pollution ratings.
Erasure Codes
People keep asking me about using erasure/rateless/error correcting codes in BitTorrent. It isn't done because, quite simply, it wouldn't help.
One possible benefit erasure codes is that when sending data to a peer there are so many potential pieces that you can send any random one you have and it won't be a duplicate. The problem is that the peer may already have gotten that same piece from another peer, so that benefit is destroyed, and on top of that the overhead of communicating and remembering which peer has what is increased tremendously.
Possible benefit number two is that erasure codes increase the chances that your peers won't already have the pieces which you've downloaded. But simply downloading pieces which fewer of your peers have first handles that problem quite nicely, so a vastly more complicated solution is unwarranted.
Possible benefit number three is that if there's no seed left erasure codes increase the chances that the entire file will be recoverable. In practice, when a file becomes unrecoverable it's because there was only one seed and several downloaders started from scratch, then the seed disappeared after uploading less than the total length of the file. Erasure codes obviously would not help out in that case.
There are other possible benefits and corresponding rebuttals, but they get more complicated. The short of it all is that the possible benefits of erasure codes can be had with much more straightforward and already implemented techniques, and the implementation difficulties of such codes are quite onerous.
While I'm pissing on everyone's parade, I should probably mention another scenario in which everyone wants to use erasure codes and it's a bad idea: off-site backup. If you store everything straightforwardly on each backup site, and each site has two nines (99%) uptime (if it doesn't you shouldn't be using it for backup) then the overall reliability will be six nines (99.9999%). Engineering for more than six nines is nothing but intellectual masturbation, because unforseeable problems completely dominate failure at that point. Therefore one-of-three gets great reliability with unreliable backup sites in exchange for having to store three times the amount of data you're backing up.
With erasure codes, you could make it so that each backup site only had to store half as much stuff, but that two of them would still need to be up to recover data. If you then have four backup sites, there's a savings of 1/3 of the storage versus the much more straightforward approach. This is a pretty small reduction given that the price of mass storage is very small and plummeting rapidly. It also comes at great expense: you have to deal with four backup sites instead of three, and the software is much more complicated. In systems like this, the recovery software not working is a significant part of the chances of the system as a whole failing. Also, any economic benefit of savings on disk space must be weighed against the costs of the software system which runs it. Given the ludicrous prices of backup systems these days, a much simpler albeit slightly less efficient one would probably be a great value.
ECC of course has some great uses, for example data transmission of noisy mediums and storing data on media which can get physically corrupted, and recent developments in it are very exciting, but it's very important to only use sophisticated tools when clearly warranted.
November 8 2004, 16:33:29 UTC 7 years ago
November 8 2004, 16:43:55 UTC 7 years ago
Anonymous
November 18 2004, 13:40:21 UTC 7 years ago
All well now
Not sure how the EPA site looked before, but as of now, all the Civic Hybrid models (both US-wide and California) score in the top5 (with only Prius beating it) and only the conventional Civic is listed in the medium range.http://www.epa.gov/autoemissions/all-ran
Deleted comment
December 7 2004, 06:29:08 UTC 7 years ago
Re: just discovered your lj and added you
If I didn't want people to read what I wrote, I wouldn't post it to the internet.November 11 2004, 23:09:20 UTC 7 years ago
However, the point of ECC backup is that you can get great reliability with 50%-reliable backup sites, such as DSL-connected machines in Seoul or Comcast subscribers in Ohio. Being able to tolerate 10 failures out of however many backups gives you 99% reliability; 30 gives you your "overkill" six nines. Tolerating 30 failures by duplicating data gives you a 31x cost multiplier; tolerating 30 failures by spreading your data across 300 peers with an error-correcting code that can tolerate 30 failures only increases your cost by 11%, which is noticeably less than 3000%.
In many cases the dominant cost will be data transfer rather than storage. Disks now cost $1 a gigabyte, and they can be erased and used again, but my DSL line costs about $0.15 a gigabyte, and every gigabyte of bandwidth used is used forever.
I agree with your point that simpler and working trumps theoretically superior but unimplemented or broken every time, though. And you would, I suppose, be the person to ask whether HiveCache fits in that elusive third category: superior but working.
Do you suppose Zooko's going to ship MNet any time soon?
November 23 2004, 03:53:20 UTC 7 years ago
I don't expect either hivecache or mnet to be usable in a corporate setting any time soon.
Anonymous
November 29 2004, 06:43:41 UTC 7 years ago
Suppose the undetectable error rate for one piece is x% with x is a very small number. If we have 1TB (1GB files downloaded by 1000 people or 4GB files downloaded by 250 people), with average 1MB piece size, transmitted between peers, we have 1,000,000 * x% cases undetected but in error. If x% is 0.0001%, i.e. 99.9999% reliable. We expect to have 1 undetectable error piece. Not bad!
But if the files are still sharing, the error will spread amongst people and no one can tell which file is corrupted. Another 1000 downloaders will add another error piece and it is cumulative.
For some files, it may not be a problem to contain small error in them. Say, in MPEG file, it may affect the movie for a fraction of second. Who notice it and who care? But some files may not allow error at all, .rar file with password may not be opened in such case.
It may be a good idea if BitTorrent can consider special cases like huge number of people share some files or sensitive files to provide optional extra error detection with extra expense in the sharing, not necessary by ECC.
November 29 2004, 06:57:17 UTC 7 years ago
Hashes for detecting error and re-requesting are clearly valuable, and are used in TCP as well. The question is whether adding codes which can fix, and not just detect, the errors is worthwile, and the answer is apparently no.
Anonymous
November 30 2004, 02:11:56 UTC 7 years ago
I haven't go into the detail of the data integrity checking algorithm for BitTorrent. But my understanding is: no error correction and detection algorithms can reach 100%. Here we have 3 cases we may have to pay attention to:
1. While most of the transmission links in the world are highly reliable, we cannot rule out the possible that someone using error-prone link with BitTorrent. It doesn't matter if the one use error-prone link has a lot of re-send and slow, he/she deserve it. But if it introduces error to the shared copy on the Internet, we do not deserve this!
2. As data transmission volume is increasing rapidly, who can tell a few years later the reliable of the current detection is enough or not to ensure no undetectable errors goes redistributed. Today 100TB per day may be a safe estimation, but who know when GB will becoming outdated like MB. Someday later we may have such kind of error noticeable.
3. The worst thing may be if someone intentionally "inject" error data that can pass the current error detection algorithm in order to destroy the shared files. If I were people from Hollywood or record companies, I may try to hire someone to write a clever program to modify data pieces in this way and redistribute them to other peers. It could be a way to stop people from sharing!
I am actually concern about the flexibility on the error detection. If BitTorrent or better say, the protocol, can allow people to add their own integrity checking, the undetected-but-in-error rate can easily be reduced to any level if needed.
November 30 2004, 02:29:30 UTC 7 years ago
Your understanding is incorrect. That's the whole point of secure hashes. BitTorrent uses sha-1.
7 years ago
6 years ago
6 years ago
6 years ago
6 years ago
Anonymous
6 years ago
6 years ago
June 22 2005, 03:16:49 UTC 6 years ago
You probably are already aware of Justin's findings - but it would be interesting to know your opinion on that ...
December 7 2004, 03:59:46 UTC 7 years ago
Erasure Codes
hi Bram,I partly agree with your comments on Erasure Codes.
Yes, as what you have said an encoding block maybe have been downloaded from other peers. So at this time erasure codes are no better than BT swarming. And because of redundant recovery scheme, erasure codes can't recover files either, when not enough blocks be get.
However, on some other sides I can't get consistent with your views.
1. Most ECCs are better when be implemented simply than be implemented complicated. But not all ECCs are like that. A kind of erasure codes is very suitful for multipoints downloading with large scale peers, called Luby Codes, which using UDP as its transmitting protocol. It has good scalability and high efficiency of recovery, based on XOR operation with data blocks, so it needs only few computing resources and can be simply implemented on usual personal computers.
2. High Storage consumption. Because there are some applications which highly desired of seriously high reliabilty on current unreliable Internet. Compared to bandwidth costs, storage costs are much lower. So the tradeoff between storage and bandwidth exists, people will choose one of the twos.
Most over above all, Luby Codes can get ride of traditional re-request of TCP transmitting. When receivers find error in encoding blocks, these blocks will be discarded without re-requesting. This can improve its efficiency. Because TCP transmission is affected by RTT value, this limited efficiency of TCP protocol.
As discussed above, I think that erasure codes really can help improving swarming efficiency.
Do you agree with me? Any comments of you are welcome.
December 7 2004, 06:28:07 UTC 7 years ago
Re: Erasure Codes
Internet data transfer algorithms require acks for congestion control anyway. Adding some information to those acks about what packets didn't make it adds no overhead. Under some conditions increasing the window size and massaging some other parameters can increase performance, but such tweaking will always perform just as well as and be a lot more utilitarian than transfer algorithms based on ECC.December 7 2004, 07:17:02 UTC 7 years ago
Re: Erasure Codes
TCP protocol changes the window size according to RTT value changing. If RTT value is changing larger, it will be considered that there must be congestion happened. However, in fact there are two cases for RTT value turning larger, one is that when congestion really happened, another is that if the distance of connection two sides turning longer, RTT value also will turn larger.So, although there is enough bandwidth left, the TCP transmitting efficiency isn't high. The rather that the condition of connections between peers is much worse than servers.
Erasure codes help swarming work well on bad connection condition. Not is it?
7 years ago
7 years ago
7 years ago
Anonymous
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
Anonymous
December 8 2004, 16:30:17 UTC 7 years ago
Good point
(from Federico Sacerdoti)Excellent points about using complex software and algorithms for diminishing gains. Its great to see someone being honest about the difficulty and reliability of real software, and factoring into the N-nines calculations.
Until we can prove implementations correct, we had better think long and hard about getting fancy. (I should take this advice myself occasionally).
-Federico
June 22 2005, 03:33:46 UTC 6 years ago
Here is one paper from Berkeley praising the erasure codes: http://classes.eclab.byu.edu/601/wiki/p
And here's another paper from MIT looking at costs of replication vs erasure: http://www.pmg.lcs.mit.edu/~rodrigo/p
It would be interesting to discuss that.
Anonymous
June 22 2005, 05:10:06 UTC 6 years ago
why are we worrying about those particualar ideas?
isn't the whole reason bittorrent works better than its predecsorrs, mostly because instead of trying to invent ways to add a few bits here and there to download speed, but rather to add new sources. i think a better topic to pursue would be the ability to remember peers you encountered in the past, even for different downloads. this could speed up bittorrent transfers by reducing 'leeching' even more, particularly because it would tend to favor users who shared more data over time than those who didn't; perhaps ranking remembered users by age of last connection, number of times met, and how 'profitable' it was for the current user to have them as a peer.June 22 2005, 07:18:14 UTC 6 years ago
Re: why are we worrying about those particualar ideas?
What you're saying is true for content that is wanted by many different people.Assuming that the content is non-private and/or popular.
Imagine it's a video of your baby making first steps.
And you want it to stay in the network for a long time.
June 22 2005, 06:54:15 UTC 6 years ago
ARQ vs FEC
Working in the digital radio business as I do exposes you to a lot of coding theory. The basic tradeoff between FEC and ARQ is very simple. Both are forms of error control coding, and each has a regime in which it works better than the other.FEC tends to beat ARQ when 1) bit error rates are very high and errors are randomly distributed; 2) latencies are very high; 3) you have many listeners to one transmitter (e.g., in broadcasting); 4) you can tolerate non-negligible residual error rates.
ARQ tends to beat FEC when 1) error rates are low and errors occur in long bursts; 2) latencies are small; 3) you're running a point-to-point link where requesting a retransmission is no big deal; 4) you need a very low residual error rate.
ARQ and FEC are often used in combination, but when they are you invariably have ARQ on top of FEC, not the other way around. You take a noisy channel and reduce the error rate to a low but nonzero value, often changing the statistics of those errors from random bit errors to relatively long bursts of erasures, and then you clean up those erasures by requesting retransmissions. In practice, FEC is most often implemented down in modems, and you rarely see it at the level of an Internet protocol.
That doesn't mean FEC is never a good idea in an Internet protocol, it's just very unusual for it to be. One important exception is reliable multicast because of the inherent advantages of FEC in a broadcast environment. Bit Torrent can be thought of as providing a multicast service, but since it is implemented on a set of point-to-point links, ARQ on those links is the preferred method.
Now if we actually had a ubiquitous IP multicast network, we either wouldn't need Bit Torrent, or it would look very different than it does.
June 22 2005, 07:32:36 UTC 6 years ago
Re: ARQ vs FEC
latencies ... IMHO we are within a factor of 2 of speed of light (in fiber which is 60% of the one in vaccum) for most connections ... and I would love to see coast to coast latencies less than 50 ms - but I doubt its possible short of quantum entanglement stuffand think about the previous comment - switches have finite buffers (cisco: 40 input, 70 output packets) which are extremely small ...
6 years ago
6 years ago
March 23 2006, 12:52:09 UTC 6 years ago
March 23 2006, 18:51:45 UTC 6 years ago
March 23 2006, 20:47:46 UTC 6 years ago
March 24 2006, 06:18:37 UTC 6 years ago