Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Here are some ideas #36

Open
AlfredoSequeida opened this issue Feb 19, 2021 · 34 comments
Open

Here are some ideas #36

AlfredoSequeida opened this issue Feb 19, 2021 · 34 comments

Comments

@AlfredoSequeida
Copy link
Owner

AlfredoSequeida commented Feb 19, 2021

Hi everyone, I've been busy, which is why I haven't been able to check-in as often. So first of all I want to thank everyone for their contributions and a special thanks to @Theelgirl and @dobrosketchkun since I know you two have been putting in a lot of work into the program. Your work does not go unnoticed.

I have some ideas I want to share and get some feedback.

  1. Increasing storage capacity.
    Originally, the program was made with the simple thought that 1-bit (black and white) pixels would allow the program to be more efficient at data retrieval against compression algorithms. But of course, using a single pixel to represent a bit, leaves a lot to be desired as far as maximizing storage goes. I think we can still achieve the same logic by adding the option for another type of encoding (in addition to 1-bit color) by using colors to represent a set of 2 bits per pixel thus doubling the storage capacity while still keeping the colors simple enough to guard against compression. Here is the math/logic:

Using the RGB color spectrum, the simplest colors are red, green, and blue. This means that they are easy to distinguish from one another even if compression changes them a bit.

With that in mind, using binary numbers we can double storage by storing 2 bits in a single color, thus giving us a possibility of 2^2 or 4 different combinations. Then we can assign a color to each combination and use black or white for the remaining color:

00: Black
10: Red
01: Green
11: Blue

Then as far as decoding goes, the logic would be the same as for black and white. We would check which color the values is closest to and assume that color:

For example, if the pixel is (255,12,30), then the color must be Red (bin: 10), since the pixel contains more red than anything else.

I haven't done the research yet, but we might also be able to take advantage of an alpha channel using the RGBA color spectrum, but I would assume that A might not be as easy to guard against compression.

  1. Adding some sort of error checking /correction algorithm
    Even though, the current implementation of fvid seems to work the times I have tried it. It's not perfect and there have been some reports of it not working with certain files. Because of this, I think we should look into adding some error checking or correction. A simple implementation might be adding a parity check where we add an extra bit or set of bits for every byte (8 bits) if the number is odd or even. However, that doesn't fix the data, it only tells us that the data is wrong and that assumes that the data was decoded correctly, which are probably too many assumptions for it to be a good solution. So I am open to hearing if anyone has any suggestions on this.

  2. GUI
    Recently, the program has been getting more attention (because of a TikTok I made lol) and I have seen more requests for a GUI. I know @dobrosketchkun made a GUI for the program, but I have not seen that implemented yet. I have not made a GUI for a Python program before, so I was doing some research and was thinking of building one using PyQT, or Kivy, but since @dobrosketchkun already did some work using Tkinter, I rather their work not go to waste. In addition, I think it would be cool to make this optional, so maybe during the install process or with a different package.

  3. Changing the License
    MIT was the original option since that is the most open license I know and I like the idea of sharing the source code and allowing people to do whatever they want with the program, but I don't want the work contributed by others to be "taken advantage of" for lack of a better phrase. With the amount of time contributors have put into their work, I would like the program and any copies to remain open source. So I propose, we change the license to GPL V3, but of course I am open to suggestions.

Everything here is just a suggestion, I want to hear what others have to say.

@Theelx
Copy link
Collaborator

Theelx commented Feb 19, 2021

  1. I think that's a great idea to find a way to increase storage capacity! However, I think that we already store 4 bits of information in 2 bits:
00: Black-Black
01: Black-White
10: White-Black
11: White-White
  1. That could work, I'm not the best in that area but I'm sure it's possible.
  2. I'd personally like to build on the existing GUI from dobrosketchkun, as I'm not an experienced GUI maker with other tools.
  3. I personally could care less if people stole my portions of the code, so I propose the Unlicense. I've used it for my other projects, and it's basically the most permissive license you can get. Let's wait for dobrosketchkun's advice first though. https://unlicense.org/

Edit: Revised what I said in parts 1 and 4.

@Theelx Theelx pinned this issue Feb 19, 2021
@dobrosketchkun
Copy link
Contributor

  1. Nice idea, I thought about it, but digress to another idea and forget about it, lol
  2. Well, you can always use Reed–Solomon, but it will blow a volume of the data
  3. My GUI is a hacky, partially working lazy few lines of code, plz make something better
  4. I double @Theelgirl I usually post my "code" under public domain

@Theelx
Copy link
Collaborator

Theelx commented Feb 19, 2021

Dobro, for 2, error correction, have you heard of LDPC error correction? I was just googling it and came across this: https://pypi.org/project/pyldpc/

Edit: Apparently it's much faster than reed-solomon, https://stackoverflow.com/questions/41883385/error-correction-with-python-and-reed-solomon-for-large-inputs

@dobrosketchkun
Copy link
Contributor

Nope, I haven't; it seems nice.

@AlfredoSequeida
Copy link
Owner Author

Awesome!

@Theelgirl This sounds like a great opportunity to get some experience with GUI's so ill leave that to you. Take your time of course. I'll also take a look at LDPC, I haven't heard of that before.

@dobrosketchkun and I have heard about Reed–Solomon, but haven't implemented it before, I will also take a look at that.

As for the License, since both of you don't really mind the licensing Issue, I think we should just leave it as is. I'll leave this issue open for a bit longer in case anyone else that wants to contribute has something to say. Thank you both for your time!

@AlfredoSequeida
Copy link
Owner Author

AlfredoSequeida commented Feb 21, 2021

Excuse my typos ahead of time, I'm on mobile.

I spent some time today implementing Reed-Solomon error correction and I got that working with some quick tests, however, it really slows down the process.

There are few more things I need to look at before pushing a commit. I have noticed that PIL has trouble when too many images are open as during the decoding process I received a "too many images are open" error . And I also want to see if I can find a way to avoid us having to encode a video with a new frame rate before extracting frames since that's making the process even longer.

I have also noticed that with larger files, sometimes zipping the files fail. I think this might be a memory issue since I experienced a case where zipping failed the first time, and after runing it again, it worked.

Since I don't have time to write the algo from scratch, I am using this library I found:

https://github.com/tomerfiliba/reedsolomon

And I am adding correction data to every byte (8 bits) and testing with an ecc value of 12. This makes it so that it's possible to recover all 8 bits if they were to become damaged (worst possible scenario), but of course that comes with the cost of it taking longer and having a larger output videos hence the current issues I want to investigate more. I also plan to make this an optional feature with the -r flag (or something else so it won't be mistaken for a 'recursive' feature) for the same reasons.

If anyone has anything to add, please feel free to do so.

@Theelx
Copy link
Collaborator

Theelx commented Feb 21, 2021

I'm still thinking that ldpc error correction will be better as it's much faster, but we'll see. At the moment, I'm setting up some documentation, will push when done.

@Theelx
Copy link
Collaborator

Theelx commented Feb 21, 2021

@AlfredoSequeida Can I get integrations perms for this github so I can set up the readthedocs thing? I pushed the docs here already, but apparently they need a working integration to publish them :(

Edit: Looks like it just needs the webhook so it can automatically update the docs with every commit: https://fvid.readthedocs.io/en/latest/

@AlfredoSequeida
Copy link
Owner Author

@Theelgirl I don't see an option on the repo to give you integrations permissions, so I was trying to see if I can set that up from my end, but looking at https://docs.readthedocs.io/en/stable/webhooks.html#github I don't see the Payload URL.

@Theelx
Copy link
Collaborator

Theelx commented Feb 22, 2021

@AlfredoSequeida I'll email it to you (your outlook email that's on your GitHub page).

@Theelx
Copy link
Collaborator

Theelx commented Feb 22, 2021

Also I just added what I believe is your readthedocs account to the project. Is your account name AlfredoSequeida on readthedocs.io?

@AlfredoSequeida
Copy link
Owner Author

Yes, that's me, I added the webhook.

@Theelx
Copy link
Collaborator

Theelx commented Feb 22, 2021

Ok cool. Can you push your current version of the reed-solomon error correction to a new branch so I can check it out and possibly speed it up?

@AlfredoSequeida
Copy link
Owner Author

Yes, I'll do that.

@AlfredoSequeida
Copy link
Owner Author

Here is the current Reed-Solomon implementation: c0f3ecd

@Theelx
Copy link
Collaborator

Theelx commented Feb 22, 2021

Cool thanks!

@AlfredoSequeida
Copy link
Owner Author

AlfredoSequeida commented Feb 22, 2021

Quick GUI mockup idea. I had some extra time today and I wanted to see what I could come up with. Just a simple idea, I don't expect it to look like that. I am not even sure about the log-in idea. I figure that people can do the uploading on their own if they wish to, but I added that since that was the original premise.

fvid_gui_mockup

@AlfredoSequeida
Copy link
Owner Author

AlfredoSequeida commented Feb 22, 2021

And going off @dobrosketchkun 's idea, there could also be an option to use a URL to decode

fvid_gui_mockup_enter_a_url

@dobrosketchkun
Copy link
Contributor

This is a very nice GUI, miles better than mine.

@Theelx
Copy link
Collaborator

Theelx commented Feb 22, 2021

@AlfredoSequeida @dobrosketchkun I did a good. This zfec error correction takes about 6-7 seconds to encode my 580KB test image versus over a minute with Reed-Solo, bloats it to 103MB mp4 instead of >200MB, and takes about 2s to decode it.

Branch:
https://github.com/AlfredoSequeida/fvid/tree/zfec-error-correction
Zfec repo:
https://github.com/tahoe-lafs/zfec

@dobrosketchkun
Copy link
Contributor

Splendid!

@AlfredoSequeida
Copy link
Owner Author

AlfredoSequeida commented Feb 22, 2021

@Theelgirl That's awesome! When I get some time, I'll try the file I was using to test Reed-Solomon.

@Theelx
Copy link
Collaborator

Theelx commented Feb 22, 2021

Sounds good! I haven't tested it on any error-ridden files yet, only normal files, so hopefully it works.

@Theelx
Copy link
Collaborator

Theelx commented Feb 23, 2021

@AlfredoSequeida Have you had time to test it yet? I'm going to bed really soon unless I need to fix something.

@AlfredoSequeida
Copy link
Owner Author

@Theelgirl I haven't yet, but I will try to test it by the end of today. Get your rest lol.

@AlfredoSequeida
Copy link
Owner Author

AlfredoSequeida commented Feb 23, 2021

@Theelgirl So, I checked out the zfec-error-correction branch, but, I don't see any zfec logic. The only change I see is the requirements.txt file. Did you push your changes? Or did I misunderstand your message? I thought you implemented the logic for zfec.

@Theelx
Copy link
Collaborator

Theelx commented Feb 23, 2021

Oops. I probably forgot to commit, pushing now.

Edit: done

@Theelx
Copy link
Collaborator

Theelx commented Feb 23, 2021

@AlfredoSequeida Yoink I did a dumb and forgot to import random to deal with errors, git pull the most recent changes and re-test.

@AlfredoSequeida
Copy link
Owner Author

AlfredoSequeida commented Feb 23, 2021

@Theelgirl Ok, So I tested your implementation and it's definitely faster, but I was not able to decode a video downloaded from youtube. With that said, I don't think it's your implementation because I also can't decode it without the zfec error correction

I keep getting this:

Unziping...
Traceback (most recent call last):
  File "/home/alfredo/.local/bin/fvid", line 33, in <module>
    sys.exit(load_entry_point('fvid==1.0.0', 'console_scripts', 'fvid')())
  File "/home/alfredo/.local/lib/python3.9/site-packages/fvid/fvid.py", line 505, in main
    save_bits_to_file(file_path, bits, key, args.zfec)
  File "/home/alfredo/.local/lib/python3.9/site-packages/fvid/fvid.py", line 294, in save_bits_to_file
    bitstring = fo.read()
  File "/usr/lib/python3.9/gzip.py", line 300, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.9/gzip.py", line 495, in read
    uncompress = self._decompressor.decompress(buf, size)
zlib.error: Error -3 while decompressing data: invalid literal/length code

Here are the youtube videos for reference:
non zfec
zfect

And here is the file I used to test, it's a 14.6MB PDF file

I have seen this before with other files, so it looks to be a problem with unzipping the data. Something is happening there that needs to be looked more at. Maybe @dobrosketchkun can help here.

Since decoding works before the YouTube upload, it makes me think that YouTube's compression is messing with the zipped data. A simple solution that I didn't get to try, but could work, is to use zfect after zipping the file (Instead of before) and then seeing if the error correction can retrieve it before unzipping it during decoding.

@Theelgirl I do have a question, I noticed that you decided to apply zfec every 50 bytes (I think that's what's happening), so my question is how many of those bytes are actually recoverable with zfec?

@dobrosketchkun
Copy link
Contributor

dobrosketchkun commented Feb 23, 2021

Sorry for not being useful recently; I'm kind of busy.

I can't write it in nice code right now, but hear me out; in the first version, we had a delimiter to determine where is the end of the file, in current, we have zip machinery to help us out. The original problem is that no one can tell if this string of black pixels at the end of the last frame is part of the file or not.

But what if we can revive byte-encoded json without such an issue? Thus we can place encoded zip into it, and I think it'll maybe help with #33.

test = {"filename": "test",
        "data" : "somedata"}

data_bytes = json.dumps(test).encode('utf-8')
# b'{"filename": "test", "data": "somedata"}'

bitarray = BitArray(data_bytes)
#BitArray('0x7b2266696c656e616d65223a202274657374222c202264617461223a2022736f6d6564617461227d')

bitarray_bin = bitarray.bin
# 01111011001000100110011001101001011011000110010101101110011000010110110101100101001000100011101000100000001000100111010001100101011100110111010000100010001011000010000000100010011001000110000101110100011000010010001000111010001000000010001001110011011011110110110101100101011001000110000101110100011000010010001001111101

bitarray_bin = bitarray_bin + '1'*8*5 # last frame pad emulator
# 011110110010001001100110011010010110110001100101011011100110000101101101011001010010001000111010001000000010001001110100011001010111001101110100001000100010110000100000001000100110010001100001011101000110000100100010001110100010000000100010011100110110111101101101011001010110010001100001011101000110000100100010011111011111111111111111111111111111111111111111

##############################################

bitstring = Bits(bin=bitarray_bin)
# Bits('0x7b2266696c656e616d65223a202274657374222c202264617461223a2022736f6d6564617461227dffffffffff')
bitarray = bitstring.bytes
_temp = str(bitarray)
# b'{"filename": "test", "data": "somedata"}\xff\xff\xff\xff\xff'


# some sketchy way to split, needs to be cleaner
_temp = _temp.lstrip("b\'")
_temp = _temp.split('"}')
_temp = '"}'.join(_temp[:-1] + [''])
_temp = json.loads(_temp)

# {'data': 'somedata', 'filename': 'test'}

@Theelx
Copy link
Collaborator

Theelx commented Feb 23, 2021

@AlfredoSequeida Actually, with a block size of 8, Zfec is applied every 8 bytes. Zfec simply expands the data by 25% using the formula bloat% = MVAL/KVAL (not sure how it recovers using the original symbols but eh). In this case, with KVAL of 4 and MVAL of 5, that means that for every 5 blocks of data, we need 4 to reconstruct the original. Since it bloated the block to 10 symbols, it's easily divisible by 5.

@Akul2010
Copy link

Can I help? I'm good at building GUIs in tkinter, PysimpleGui, and PyQT5. (Ok at kivy and kivymd as well.)
Sorry for the typos

@Theelx
Copy link
Collaborator

Theelx commented Sep 16, 2022

Thanks for the offer, however this project has been unofficially superseded by https://github.com/MeViMo/youbit. That repo is better in essentially every way.

@AlfredoSequeida
Copy link
Owner Author

AlfredoSequeida commented Sep 17, 2022

Can I help? I'm good at building GUIs in tkinter, PysimpleGui, and PyQT5. (Ok at kivy and kivymd as well.)
Sorry for the typos

Hey @Akul2010 ! Thank you for offering to help! Although that would be awesome, I honestly haven't had time to work on the project recently. I think @Theelx 's idea of contributing to a more active project would be best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants