www.ice-graphics.com Forum Index www.ice-graphics.com
The main forum for the ICE-Graphics software
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Is ICE-EEC Par3 + Please improve interface
Goto page Previous  1, 2
 
Post new topic   Reply to topic    www.ice-graphics.com Forum Index -> Suggestions
View previous topic :: View next topic  
Author Message
Glyph



Joined: 05 Jun 2005
Posts: 31

PostPosted: Wed Aug 03, 2005 1:51 am    Post subject: Reply with quote

To mrQQ:

The problem is actually more fundamental than that.

Even if you split the data and package/organize/mess with it in anyway, as long as you have 20% total damage, you NEED 20% redudancy to fix it.

You can't retrieve data you don't have, that's a fundamental part of information theory.

Here's how it can't work:

You got your 100mb of data, and 10% redundancy or 10mb of ECC. So now you got 110mb of "stuff" in total. So far so good.

Now you destroy 20mb of data, so you got 80mb left and 10mb of ECC. That's 90mb left of "stuff". The goal now is to get 100mb of data from 90mb of stuff.

IT CAN'T BE DONE!!! If it could then you have just defined a new means of data compression that can recursively compress to 1 byte. (the internet is full of peddlers of such algorithms, kinda like perpetual motion).


Now for the second problem, trying to convert 10mb of ECC into 10mb of useful data ontop of the 80mb you already have.

It CAN be done.... but not PERFECTLY.

Having less than the minimum amount of ECC is like having a crossword puzzle where some of the clues are missing and only part of the puzzle has already been solved. Using the clues and the parts already solved you can solve the other parts of the puzzle but there will be gaps. The problems is the parts you do solve will be at random places, useless for retrieving any file bigger than 10 bytes or so.

You can't select which block of data you get back cause if you could then you can say select 1mb in one run, then another 1mb in another run and keep going until you got back all 20mb. Essentially violating the conservation of information theorem i stated earlier.

In addition, even if Mr. ICE Graphics wanted to put in the solving algorithms that get back small amounts of data from incomplete sets he can't: Those algorithms are patented, he'd have to starting paying for them and ICE ECC wouldn't be free anymore.


As for the method you suggested:

It's called interleaving and although it can work to retrieve small amounts of data AND its not patented it suffers from a major drawback:

if damage occured say all in one set then you would have a situation where 10mb of data were damaged, and there is 10mb of ECC available but you could NOT fix it; because 9mb of that ECC is for other sets and not the set that was damaged. So in this case its even WORSE than going with a straight 100mb set because the 100mb of straight data/ECC could still have been fixed.

If you really want it, you can do interleaving manually by splitting your data and creating ICE ECC files seperately for each block.
Back to top
View user's profile Send private message
mrQQ



Joined: 08 Jul 2005
Posts: 10

PostPosted: Wed Aug 03, 2005 5:49 am    Post subject: Reply with quote

Quote:
Even if you split the data and package/organize/mess with it in anyway, as long as you have 20% total damage, you NEED 20% redudancy to fix it.


well of course you do! did i ever say otherwise? you can't make up stuff you dont have! i thought that was more than obvious?

Quote:
if damage occured say all in one set then you would have a situation where 10mb of data were damaged, and there is 10mb of ECC available but you could NOT fix it; because 9mb of that ECC is for other sets and not the set that was damaged. So in this case its even WORSE than going with a straight 100mb set because the 100mb of straight data/ECC could still have been fixed.


that's why i said "perhaps combine it with full-set ecc data". so that you can still recover in that case.

in general - you haven't said anything new, just pointed flaws in my idea, which i already know of. what i want is some constructive thinking - perhaps there IS some other workaround which can help (at least partially) in such cases.

as for patents, i'm sure there are loads of patents for reed-solomon codes aswell, but that doesnt seem to be an issue.
Back to top
View user's profile Send private message
ICE Graphics
Site Admin


Joined: 31 Mar 2003
Posts: 430

PostPosted: Wed Aug 03, 2005 9:22 am    Post subject: Reply with quote

mrQQ wrote:
say same example - 100mb data, 10% redundancy, 20mb damaged data.

what if we split source data in 10 sets of 10mb - S1..S10. We get 10 sets of redundant data aswell - R1..R10, each 1mb (those can be compiled into one file perhaps, or left separate, doesnt matter). Now, this doesnt help at all if damaged data is continous, as for one set, there is only 10% of redundancy. But if damage is scaterred , it could perhaps be possible to save some data? Now that I think about it, it doesnt sound so easy.. but perhaps a combination of split-set redunancy with full-set redundancy could work? This is just an idea for work-around, i'm sure you or someone else can come up with something even smarter!

Yes, your solution can help sometimes. The main question: you have to save whole data or any part of data also is important for you?

Case 1. If you planning to protect some ISO images or archives, any damage of witch make it useless. The best strategy to use ICE ECC like it work now.

Case 2. You store photos or music files. A lot of files. If some files will be damaged, it's not fatal. In what case you strategy can win from ICE ECC strategy. However, there is a serious drawback in your strategy. Example: you have 2MB defects. If you are using default ICE ECC strategy - all data would be recovered (global 10MB ECC block). You strategy splitting by sets will fail, if this is damage will localize into the single set. Because every set have only 1MB ECC data. Of course, if data corruption will be scattered over whole disk, you strategy will success. Time to think...
Back to top
View user's profile Send private message Visit poster's website
mrQQ



Joined: 08 Jul 2005
Posts: 10

PostPosted: Wed Aug 03, 2005 10:38 am    Post subject: Reply with quote

ICE Graphics wrote:
mrQQ wrote:
say same example - 100mb data, 10% redundancy, 20mb damaged data.

what if we split source data in 10 sets of 10mb - S1..S10. We get 10 sets of redundant data aswell - R1..R10, each 1mb (those can be compiled into one file perhaps, or left separate, doesnt matter). Now, this doesnt help at all if damaged data is continous, as for one set, there is only 10% of redundancy. But if damage is scaterred , it could perhaps be possible to save some data? Now that I think about it, it doesnt sound so easy.. but perhaps a combination of split-set redunancy with full-set redundancy could work? This is just an idea for work-around, i'm sure you or someone else can come up with something even smarter!

Yes, your solution can help sometimes. The main question: you have to save whole data or any part of data also is important for you?

Case 1. If you planning to protect some ISO images or archives, any damage of witch make it useless. The best strategy to use ICE ECC like it work now.


actually, not necessery - ISO image would work with some damage aswell, only some of files in it would be unreadable. same for archives, when you try to recover, it would recover the files that have been fixed (if any).

Quote:
Case 2. You store photos or music files. A lot of files. If some files will be damaged, it's not fatal. In what case you strategy can win from ICE ECC strategy. However, there is a serious drawback in your strategy. Example: you have 2MB defects. If you are using default ICE ECC strategy - all data would be recovered (global 10MB ECC block). You strategy splitting by sets will fail, if this is damage will localize into the single set. Because every set have only 1MB ECC data. Of course, if data corruption will be scattered over whole disk, you strategy will success. Time to think...


or for example a long movie.. any recovery on that is useful, cause you can watch more then, right.

but you're right, my suggestion isn't that good, it would work rarely. that's why i said perhaps combining them, or coming up with something even better..

btw, creating two sets for 500mb of data is faster than one set for 1000mb, right? for same settings?
Back to top
View user's profile Send private message
Glyph



Joined: 05 Jun 2005
Posts: 31

PostPosted: Wed Aug 03, 2005 2:47 pm    Post subject: Reply with quote

Well, sorry if i'm not telling you anything new. Your post wasn't very clear what you knew and did not know and so by making the suggestions you did it implied you weren't fully aware of the inherent flaws in them.


As for patents: The Reed Solomon code theory was invented back in the 1960s and no longer protected by patent law, so free reign there. In addition, Plank's algorithm for erasure decoding is free under the LGPL (Although i think Mr. Ice Graphics is using a much faster algorithm).

The more modern decoding algorithms that can reconstruct incomplete data were invented in the late 70s - 80s and thus are still protected.

This is not say we're at a dead end, there are other codes that are very good at reconstructing lost data from incomplete sets.

Now as stated before there is no code in the universe that can make something from nothing. But there are some non-reed solomon codes that can take 10mb of ECC and convert them to 10mb data straight out with a single set under the best conditions. Something that Reed solomon codes never do, even with the modern algorithms.

Some of these codes are even public domain like LDPC codes which were also invented in 1960's.

However, these codes suffer from a major drawback: They are not "Minimum Distance seperable" which is fancy mathematical term (i hate math because of this... Smile ) for meaning that you can't guaruntee for all cases that you will be able to get perfect data recovery with 100% total data and ECC. So there will be cases where you got 10mb of ECC and 90mb of data but you STILL can't fix anything because all the damage happens to land on just the right spots to ruin the math. In fact, there are some cases that you can have 120% total data/ECC and still not get back everything. Compared to reed solomon, this is a disaster.

However, these codes are better than interleaving, and on AVERAGE they can yield VERY good recovery rates. They're also extremely fast compared to reed-solomon.

In fact, such codes, Turbo Codes, LDPC Codes, etc.etc.etc are already being used in cell-phones and satellite communication, where the idea is to get "most of the data most of the time". The requirement of "all the data all the time" is much more relaxed. And in the case of satellite communication, you can't add more ECC without sacrificing bandwidth which costs A LOT for a satellite. So "most of the data most of the time" is perferable. For comparison, a hard drive has to have "all the data all the time" so hard drives have reed solomon integrated into the electronics. The drive is simply thrown out when it gets corrupted.

If you're after "most of the data most of the time" then go with LDPC, if you want "all the data all the time" then go with reed solomon.

This is probably obvious but i'll mention it anyway: The codes are not interchangeable. you can't decode with LDPC what you encoded with reed solomon, so you gotta use LDPC at the beginning if you want it.

The thing is, for our purposes of long-term file storage. If you really want long-term reliabilty why not just add more ECC? DVD Media is cheap (unlike satellite bandwidth).

But maybe i'm assuming: Tell us, what are your requirements? Are you transmitting over a bandwidth limited channel? If so we can suggest some better codes than reed solomon for that purpose.
Back to top
View user's profile Send private message
mrQQ



Joined: 08 Jul 2005
Posts: 10

PostPosted: Wed Aug 03, 2005 3:55 pm    Post subject: Reply with quote

I have no requirements - I just like developing thoughts of "what could happen" and of "what can we do about it" Smile that's why i suggested header redundancy, sliding window search for data, etc, etc. This is just another theoretical situation (for example we store video files - would be nice to be able recover everything possible, without deadly needing to recover everything).

So what about combining reed-solomon and your mentioned codes? Simplest way would be to have both.. but I think there allways is something better - right..? Smile
Back to top
View user's profile Send private message
ICE Graphics
Site Admin


Joined: 31 Mar 2003
Posts: 430

PostPosted: Wed Aug 03, 2005 5:53 pm    Post subject: Reply with quote

mrQQ wrote:
actually, not necessery - ISO image would work with some damage aswell, only some of files in it would be unreadable. same for archives, when you try to recover, it would recover the files that have been fixed (if any).

Usually almost all data use during install from ISO image (more than 90%). And during install CRC check do not allow to install from damaged CD. Of course if it's data CD.


mrQQ wrote:
but you're right, my suggestion isn't that good, it would work rarely. that's why i said perhaps combining them, or coming up with something even better..

Combining can help in some worst cases. Of course, combing can be usefull only for media files or for separated data files.

mrQQ wrote:
btw, creating two sets for 500mb of data is faster than one set for 1000mb, right? for same settings?

Yes, it's faster. But it in worst case has reduced recovery ability in 2 times.
Back to top
View user's profile Send private message Visit poster's website
ICE Graphics
Site Admin


Joined: 31 Mar 2003
Posts: 430

PostPosted: Wed Aug 03, 2005 6:32 pm    Post subject: Reply with quote

Glyph wrote:
This is not say we're at a dead end, there are other codes that are very good at reconstructing lost data from incomplete sets.

I am not sure, what there are a lot of good codes. Let's add some filter, some required code specifications:
1. This is code have to work for very large blocks count. At least 4096 blocks. Or if it blocksless code it have to work with very large data size. At least 10GB.
2. This is code can not modify source data. Some sorts of linear codes (and not only linear) work like some kind of archivers. It's good for some hardware (cell-phones, hardware level in CD/DVD, protected data channels). But it's not good for protecting data for CD/DVD/Usenet/FTP/HTTP.

Glyph wrote:
Now as stated before there is no code in the universe that can make something from nothing. But there are some non-reed solomon codes that can take 10mb of ECC and convert them to 10mb data straight out with a single set under the best conditions. Something that Reed solomon codes never do, even with the modern algorithms.

Reed-Solomon can also find best condition. But it's absolutely useless. Try to solve system of N linear equation with M unknown variables, then M>N. You can found i lot of solutions, but it's useless for you. Why this is happened with Reed-Solomon codes? Because Reed-Solomon is a TRUE global code. It's mean what any N damaged blocks can be recovered using only N (or N+1 is some worst cases) recovery blocks. No one linear code can do it.


Glyph wrote:
In fact, there are some cases that you can have 120% total data/ECC and still not get back everything. Compared to reed solomon, this is a disaster.

120% is too optimistic bound. Theory give what there is not theoretical bound, when recovery is can be guarantee. It's mean, what even having 300% redundancy you can not guarantee data recovery. However probality to recover data with 300% redundancy can be high, like 99.5%


Glyph wrote:
However, these codes are better than interleaving, and on AVERAGE they can yield VERY good recovery rates. They're also extremely fast compared to reed-solomon.

It's better only when if error rate not exceed maximum allowed value. It's better for data channels with constant error rate. Real life is different. There are some important class of error witch need to fix:
1. Scratches or surface degradation on CD/DVD. High peak error rate.
2. Lost files. Also high peak error rate.

Most known linear codes work very well for low error rate near 0.1 (not more than 0.3). Fix me if i am wrong.


Glyph wrote:
In fact, such codes, Turbo Codes, LDPC Codes, etc.etc.etc are already being used in cell-phones and satellite communication, where the idea is to get "most of the data most of the time".

This is perfect place for linear codes, because:
1. All channels has constant error rate
2. Because this is codes usually hard coded. I mean, some fixed configurations, like data size: 256 bytes, ECC size 16 bytes.
3. Data channels usually use small block size. It's easy to make code with data size 256 bytes. But it's not trivial task to make linear code for 4GB data.


Glyph wrote:
If you're after "most of the data most of the time" then go with LDPC, if you want "all the data all the time" then go with reed solomon.

Linear codes faster than Reed-Solomon codes in some cases. But linear codes has less recover possibility. What is most important for you: computation time or percent of recovered info? For me most important percent of recovered info. So, i prefer only global codes, like Reed-Solomon codes.


Glyph wrote:
But maybe i'm assuming: Tell us, what are your requirements? Are you transmitting over a bandwidth limited channel? If so we can suggest some better codes than reed solomon for that purpose.

Agree. For data channels linear codes as a best solution now. But not CD/DVD storage or FTP/WWW/Usenet.
Back to top
View user's profile Send private message Visit poster's website
Glyph



Joined: 05 Jun 2005
Posts: 31

PostPosted: Wed Aug 03, 2005 9:15 pm    Post subject: Reply with quote

To ICE Graphics:

Quote:
Reed-Solomon can also find best condition. But it's absolutely useless. Try to solve system of N linear equation with M unknown variables, then M>N. You can found i lot of solutions, but it's useless for you. Why this is happened with Reed-Solomon codes? Because Reed-Solomon is a TRUE global code. It's mean what any N damaged blocks can be recovered using only N (or N+1 is some worst cases) recovery blocks. No one linear code can do it.


I NEVER said any linear code could recover any data, global or not. I said that under the "BEST CONDITIONS" the code can restore straight data. What Iím implying by best conditions is the one situation where the ECC data happens to fall exactly at the right positions to restore the data. A brief (although bad) example is simple XOR parity. We have two sets of data, say 50mb each for 100mb in total, then we create 50mb of parity from the two sets. Under the BEST CONDITIONS 50mb of data from one set is destroyed leaving 50mb of the other set and 50mb of Parity data. In this case you can restore 50mb of data straight through. And even if 40mb of data from the other set and the parity set were destroyed, you could still restore 10mb under the BEST CONDITIONS. Now if that 10mb you restored happened to contain what you want to get back then you have success. Most of the time it does not.

I stress again, I NEVER SAID that a linear code could restore any given piece of data better than a reed solomon code. I only said that under certain very rare (and mostly useless) circumstances a linear code could work where a reed solomon code could not.


Quote:
120% is too optimistic bound. Theory give what there is not theoretical bound, when recovery is can be guarantee. It's mean, what even having 300% redundancy you can not guarantee data recovery. However probality to recover data with 300% redundancy can be high, like 99.5%


Ofcourse, sorry if i implied otherwise.

Quote:
It's better only when if error rate not exceed maximum allowed value. It's better for data channels with constant error rate. Real life is different. There are some important class of error witch need to fix:
1. Scratches or surface degradation on CD/DVD. High peak error rate.
2. Lost files. Also high peak error rate.

Most known linear codes work very well for low error rate near 0.1 (not more than 0.3). Fix me if i am wrong.


Sorry if i wasn't clear, i realize i didn't write enough.

What i meant by average is that the average cases in industry mostly involve radio communications where such codes are fast and efficient. For computer archiving where files are mostly lost in whole chunks, such codes don't work if at all.

Quote:
Linear codes faster than Reed-Solomon codes in some cases. But linear codes has less recover possibility. What is most important for you: computation time or percent of recovered info? For me most important percent of recovered info. So, i prefer only global codes, like Reed-Solomon codes.


So do I. All of my data must be perfect or its completely useless.


To mrQQ:

Quote:
So what about combining reed-solomon and your mentioned codes? Simplest way would be to have both.. but I think there allways is something better - right..? Smile


Although combining them can be done, it doesn't accomplish much:

Letsay you got 100mb of data, you now add on 50mb of Reed Solomon Code and then another 50mb of LDPC Code. You got 100mb of of ECC but NOT 100% redundancy, only 50%, maybe 60% if you got REALLY lucky. After you lose 70% you might be able to restore 5% using the LDPC so you now got 35% of your old data back.

But now you are sitting with 135mb of data/ecc you'll wonder: "If i had gone Reed Solomon all the way, i would have had complete restoration. If i had gone with LDPC all the way i would have gotten back more than just a crummy 5%"

As for something better, Reed Solomon is the BEST GENERAL PURPOSE code. Because it works right at the limit of coding, the "something for nothing" rule. As long as the final output of the Code + Data = 100% then Reed solomon will always work. This is absolute perfection in terms of coding.

But for special purposes, reed solomon is not the best and can actually be worse, such as satellite communication where its better to have more of something than all of nothing.

But that's the point, Special purpose codes are for special purposes, without defining a requirement we can't improve an existing code for it. And any improvements in one area will make it terrible for different areas. Just like LDPC codes are great for Cell phones or satellites where errors are constant and random, Reed solomon codes are better for computers where failures are chunky and predictable.

So there is no general purpose code for everything that's better than Reed Solomon. If you want something that works for special purposes then i'm sure we can find something better than reed solomon for that purpose.

Is Video encoding a special purpose you want to improve on?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    www.ice-graphics.com Forum Index -> Suggestions All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group