furion's new toy: A full RPC steemd node for SteemData

in #steem7 years ago (edited)

I've finally setup my own full-RPC steemd node on a 6 core Xeon server with 256GB of ECC DDR4 RAM, in a datacenter nearby SteemData (over private, 1gbit network with sub 1ms network latency).

xeon.png

Running a full node is an additional maintenance workload, but it seems to be no longer avoidable. I hope that this new deployment, in combination with @gtg's node as a backup, will improve speed and reliability of SteemData services.

Public Steemd Nodes

Steemit's official nodes have been rock solid in the past month, and served well as a backbone for many of my services. I have also used @gtg's nodes extensively, since they are hosted in EU.

Screenshot from 2017-09-26 13-02-00.png

I am really happy about the proliferation of full steemd RPC nodes by the community, however I haven't had a chance to extensively test them yet.

Why Private node?

I currently run 3 databases as a service, and attempt to maintain steemd internal state synced up to the main SteemData database. I'm also syncing up the new databases from scratch (hive, sbds). All in all I'm currently performing millions of requests daily to steemd instances.

Unfortunately, SteemData servers are located in Germany, which adds a fair amount of network latency to most of the public nodes I tested. The per-request network latency, as well as limitations on available throughput were causing some issues, as the database indexers could not catch up with the blockchain head.

Why 256GB of RAM?

It is possible to run RPC nodes on hardware with lower specs, but unfortunately my needs require the fully specced out setup.

Reducing memory usage by selectively enabling features
It is possible to run RPC nodes with lower requirements. For one, not every app needs all the plugins. An app like Busy or Dtube doesn't need the markets plugin for example.
Secondly, its possible to blacklist certain operations from being indexed in account history plugin, which can also drastically reduce memory usage.

The point of SteemData is to process and store all the available information, so these optimizations do not apply.

Using SSD instead
Without high throughput and low latency requirements, its possible to run the shared memory file on a SSD. By doing so, a full RPC node could be hosted on a server with as little as 16GB of ram.

SteemData is making a lot of arbitrary requests, and to stay near-real time in state synchronization, the throughput and latency are crucial. Which is why I need all of the state to be mapped out in RAM, and the node be hosted in the same datacenter as the rest of SteemData servers. This setup is an over-kill during normal operations, but very much needed when syncing up from scratch.

Setup

I've made a custom docker image, based on Steemit's. (Dockerfile, run-steemd.sh)

I've assigned 200GB of 'ramdisk' for shared memory file, using ramfs,
with the following fstab entry:

ramfs /dev/shm ramfs defaults,noexec,nosuid,size=210GB 0 0

.
I've adopted @gtg's awesome full node config as a base, and tweaked it a bit.

rpc-endpoint = 0.0.0.0:8090
p2p-max-connections = 200
public-api = database_api login_api account_by_key_api network_broadcast_api tag_api follow_api market_history_api raw_block_api
enable-plugin = witness account_history account_by_key tags follow market_history raw_block

enable-stale-production = false
required-participation = false
shared-file-size = 200G
shared-file-dir = /shm/steem

seed-node = 149.56.108.203:2001         # @krnel (CA)
seed-node = anyx.co:2001                # @anyx (CA)
seed-node = gtg.steem.house:2001        # @gtg (PL)
seed-node = seed.jesta.us:2001          # @jesta (US)
seed-node = 212.117.213.186:2016        # @liondani (SWISS)
seed-node = seed.riversteem.com:2001    # @riverhead (NL)
seed-node = seed.steemd.com:34191       # @roadscape (US)
seed-node = seed.steemnodes.com:2001    # @wackou (NL)

.
Lastly, I run everything in Docker.

docker run -v /home/steem_rpc_data:/witness_node_data_dir \
           -v /root/fullnode.config.ini:/witness_node_data_dir/config.ini \
           -v /dev/shm:/shm \
           -p 8090:8090 -d \
       furion/steemr

Conclusion

This setup is fairly new (in production for less than 1 day), but the results are already promising. The syncing speed is more than 100x faster vs using remote nodes, and I haven't ran into any throughput limitations yet. As long as the node doesn't crash, things should be golden.

Sort:  

Keep up the good work @furion

For those who might think that such private RPC node doesn't serve the network. Of course it does serve the network. It will reduce load on other public nodes leaving those resources to those who can't run their private nodes.

Yup, I've been depending on Steemits nodes for months, and by moving to my own node, Steemits workload should be a bit lower.

Furthermore, the whole point of SteemData is to avoid hammering steemd, since querying databases can be many orders of magnitude more efficient.

True,. 15,000,000 requests vs one single query to SteemData.
That makes a difference.

@furion you can probably take a look at steemd.com which has been down for the last 7 hours...you guys do a awesome job securing the blockchain, hopefully your initiative can encourage others and we experience an overall better experience...also have to take a look at steemdb and steemd which can vary or probably lag behind (especially steemdb), probably your initiative can help better the services offered by others

SteemData is open and free to use, but unfortunately its not a HA setup (high-availability).
I do have ~200 active connections on MongoDB alone, from various users and apps.

The database has been rock solid so far, and there is a fairly easy scalability path:

  • add more RAM to database server for high-performance
  • create a replica cluster for HA & backup purposes

The weak points in SteemData right now are:

  • node stability/availability
  • bugs in my code

But still impressive results as the systems has been stable and supporting so much application. I particularly like Steemdata as it helps me navigate my own feed just in case I miss anything and helps me monitor users active users...probably we can get more dedicated team on the nodes as it goes down something but this one has been lengthy, I am not to sure of the overall effect it has on the platform...apart from that fantastic work...

SteemData is open and free to use, but unfortunately its not a HA setup (high-availability).

Are you making SteemData available as a public RPC endpoint also, because it seems from the article you plan to keep it private, at least for now (Why Private node?). If you do plan to make it public at some point, I'd love to add it to my list of "goto servers" as well. Regardless, it's another great addition that you bring to the community.

On a separate note regarding @gtg's endpoint, you left SSL as a question mark. However, I can attest that gtg.steem.house requires SSL / wss, despite the use of an alternate port (8090).

We cant continue adding RAM to servers for higher performance, when at the current stage you are running 256GB Node (which ORACLE is using for their DB and JS apps) we are serving less than 400k users. That mean with 1 milion, Steem will 768GB Ram ?

I hope you SteemData distress steemd in proper way, otherwise we are facing strong facepalm in near future.

saya sangat menyukai cerita yang anda pamerkan
semoga anda senang apa yang sayang katakan, senang bisa melihatnya

lol which language is it ? :D hahaha

...6 core Xeon server with 256GB of ECC DDR4 RAM

Are you still working on Viewly @furion?

Yes, I am.

I've been recently taking some time to fix/improve my steem services and witness infrastructure.

I've been fortunate enough to have a small team helping me with Viewly, so things are still progressing.

Very pleased to hear all of that. ^

Holy s.it! 256gb RAM, thats quite a big server ur running there. Congrats. J

One could get away with a lot less, but I want it to be as fast as possible.

RAM on steroids :D

Well done and congratulations 256gb RAM is a very fast one!
Can you please let me know if there are any updates for Viewly... i did sign up for the newsletter but no news for now plus i did not notice on the site a way to create an account and start adding videos!
Thanks a lot and have a great day!😉

Maybe it went into spam. Feel free to join us on Telegram.

Maybe it went there...honest to be i never look at the spam section. I am using Telegram for a while so can you please let me know how can i join you there...
Thanks a lot!

Good stuff man. Looks like you'll be able to use the rewards from this post alone to run the RPC server for at least a month or two for no cost. Go Steem

Great idea. I would like to see more hopefully with an dumbed down explanation of what some components are. The scientific terms make it hard to follow, but from what I understood Im impressed!

thanks 4 sharing

Interesting set up .id like to set up something similar to this very soon . Good work :)

I have 12 core xenon work station 38 gb ecc is an dell t5500 can I run a node from Ireland?