Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Web Server Parsing-impl.h using StreamString to avoid String Reallocations #9005

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Lan-Hekary
Copy link
Contributor

I was getting many Reallocating large String Warnings when decoding a long URL, and I found that the length of the string is already known before allocating, so it made sense, and it made a big difference.

@mcspr
Copy link
Collaborator

mcspr commented Oct 16, 2023

Should it count % to definitely know the output size?
i.e. text.length() - (std::count(text.begin(), text.end(), '%') * 2)

@Lan-Hekary
Copy link
Contributor Author

Hello @mcspr, Thanks for the suggestion, yours is more optimized. I will edit the PR now.

@Lan-Hekary
Copy link
Contributor Author

I went into another rabbit hole as I was getting Reallocating large String messages for request arguments and headers.
Turns out that there is a pretty neat new interface called StreamString !!
I don't know who is responsible for it yet, But kudos !!

The Main culprit in the String Allocations is readStringUntil('\r') because it concatenate the string char by char.
Using StreamString still concatenate but in bulk, saving a lot of memory management.

I only made the modification to the webserver, a lot of legacy code in the core that depends on readStringUntil in general can be modified in the same way.
I am waiting on your inputs and if you want me to continue with this refactoring.
Thanks.

@Lan-Hekary Lan-Hekary changed the title Reserve the String Length in Webserver::urlDecode to avoid String Reallocations Refactor Web Server Parsing-impl.h using StreamString to avoid String Reallocations Oct 21, 2023
@d-a-v
Copy link
Collaborator

d-a-v commented Nov 4, 2023

I am waiting on your inputs and if you want me to continue with this refactoring.

Please go ahead, and thanks !

@d-a-v d-a-v added the alpha included in alpha release label Nov 4, 2023
@d-a-v
Copy link
Collaborator

d-a-v commented Nov 7, 2023

Given #9011, would you think having sendUntil(to, untilString) would help ?

@Lan-Hekary
Copy link
Contributor Author

Lan-Hekary commented Nov 8, 2023

@d-a-v Yes, think it would help too much.
Right now I use client.sendUntil(req, '\r'); to send the string minus the carriage return character then client.readStringUntil('\n'); to dummy read the new line character.
If I can use a string terminator "\r\n" this would further simplify the call to only one.
But I fear it may introduce some complexity in the processing that defeat the purpose of the refactoring.
Character comparison is faster than string comparison. and in case of HTTP Protocol you can always expect \r\n, so I guess it's better from performance standpoint to use Character comparison.

Upon further look of #9011 the implementation is based on character comparison, and I think it would not hurt performance.
if we can do the same in sendUntil it would help.

@d-a-v
Copy link
Collaborator

d-a-v commented Nov 17, 2023

if we can do the same in sendUntil it would help.

Right. It was made from readStringUntil so the same api extension would indeed be interesting.

@Lan-Hekary
Copy link
Contributor Author

Hello @d-a-v ,
I tried to implement the same API for StreamStrings and then Return String at the end.
Please take a look at my attempt, it's still under test, I did some testing and so far it's working, I am afraid if there is any edge case I missed, so help me to test if you can.
I am waiting on your feed back, then I will clean the code up for merge if you approve.

@@ -288,6 +289,59 @@ String Stream::readStringUntil(const char* terminator, uint32_t untilTotalNumber
return ret;
}

String Stream::readStreamString(const ssize_t maxLen ,const oneShotMs::timeType timeoutMs) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the same as ::sendSize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not Exactly, It constructs a String Object and a Stream Object attached to it, then populate it, then returns the String Object, it's behavior is more like ::readString

`sendSize' expect you to pass a Pointer to a Stream and returns the total number of bytes.

readStreamString is just a wrapper for the String and Stream contruction.

Copy link
Collaborator

@d-a-v d-a-v Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed not exactly the same.
The goal of the send* functions is to handle and accelerate transfers for any kind of stream like StreamString are.

(edited:)

line = client.readStreamString(size); // (no copy)

can also be written as

client.sendSize(S2Stream(line), size); // no copy (not tried though, meant to work, temporary is assumed)

Anyway this function is not used in the following of your pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, your version is much smaller 😄
My version is not used indeed, but made it available so that we have a Uniform API like 'readString*' but with streams.

Please advise on the best course of action, should I remove the unused functions ?
Do you have a better Idea to make the API more intuitive ?

return ret;
}

String Stream::readStreamStringUntil(const int readUntilChar, const oneShotMs::timeType timeoutMs) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as ::sendUntil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return ret;
}

String Stream::readStreamStringUntil (const char* terminatorString, uint32_t untilTotalNumberOfOccurrences, const oneShotMs::timeType timeoutMs) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For full genericity with any kind of stream, we should rather extend ::sendGeneric to accept a char* instead of a char.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We Should Do that, but I am having a hard time making this modification.
This Function will be much simpler if we can do that.
But The basic Idea here is to match the API of readStringUntil for single char in the first function and Terminator String in this function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For full genericity with any kind of stream, we should rather extend ::sendGeneric to accept a char* instead of a char.

Having tried implementing that...

  • we'd have to change API from char to cstr*?
  • 'basic' streams have to have some kind of logic jump to stall writes until delimiter is actually read, since the current one simply pushes char-by-char
  • 'peekBuffer' streams don't have the issue above, but still have to have some kind of extra state management to track delimiter

Still, original Stream methods could be improved? idk if users care about internals here, only that String is the end result.
i.e. master...mcspr:esp8266-Arduino:strings-webserver/pr9005 as a small experiment where we don't care about the type of Stream, and utilize String buffer as a window into delimited data
(which btw would extend to some other internal libs, not only webserver)

mcspr added a commit to mcspr/esp8266-Arduino that referenced this pull request Jul 26, 2024
impl based on esp8266#9005, but for existing Arduino methods
mcspr added a commit to mcspr/esp8266-Arduino that referenced this pull request Jul 26, 2024
impl based on esp8266#9005, but for existing Arduino methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alpha included in alpha release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants